Listening. Learning. Leading: 



Assessing Written Communication 
in Higher Education: Review 
and Recommendations 
for Next-Generation Assessment 


Jesse R. Sparks 
Yi Song 

Wyman Brantley 
Ou Lydia Liu 












\©S 








& 








December 2014 







ETS Research Report Series 


EIGNOR EXECUTIVE EDITOR 

James Carlson 
Principal Psychometrician 

ASSOCIATE EDITORS 


Beata Beigman Klebanov 
Research Scientist 

Heather Buzick 
Research Scientist 

Brent Bridgeman 

Distinguished Presidential Appointee 

Keelan Evanini 
Managing Research Scientist 

Marna Golub-Smith 
Principal Psychometrician 

Shelby Haberman 

Distinguished Presidential Appointee 


Donald Powers 

Managing Principal Research Scientist 

Gautam Puhan 
Senior Psychometrician 

John Sabatini 

Managing Principal Research Scientist 

Matthias von Davier 
Senior Research Director 

Rebecca Zwick 

Distinguished Presidential Appointee 


PRODUCTION EDITORS 


Kim Fryer Ayleen Stellhorn 

Manager, Editing Services Editor 


Since its 1947 founding, ETS has conducted and disseminated scientific research to support its products and services, and 
to advance the measurement and education fields. In keeping with these goals, ETS is committed to making its research 
freely available to the professional community and to the general public. Published accounts of ETS research, including 
papers in the ETS Research Report series, undergo a formal peer-review process by ETS staff to ensure that they meet 
established scientific and professional standards. All such ETS-conducted peer reviews are in addition to any reviews that 
outside organizations may provide as part of their own publication processes. Peer review notwithstanding, the positions 
expressed in the ETS Research Report series and other published accounts of ETS research are those of the authors and 
not necessarily those of the Officers and Trustees of Educational Testing Service. 

The Daniel Eignor Editorship is named in honor of Dr. Daniel R. Eignor, who from 2001 until 2011 served the Research and 
Development division as Editor for the ETS Research Report series. The Eignor Editorship has been created to recognize 
the pivotal leadership role that Dr. Eignor played in the research publication process at ETS. 




ETS Research Report Series ISSN 2330-8516 


RESEARCH REPORT 

Assessing Written Communication in Higher Education: 
Review and Recommendations for Next-Generation 
Assessment 

Jesse R. Sparks, Yi Song, Wyman Brantley, & Ou Lydia Liu 

Educational Testing Service, Princeton, NJ 


Written communication is considered one of the most critical competencies for academic and career success, as evident in surveys of 
stakeholders from higher education and the workforce. Emphasis on writing skills suggests the need for next-generation assessments 
of writing proficiency to inform curricular and instructional improvement. This article presents a comprehensive review of definitions 
of writing proficiency from key higher education and workforce frameworks; the strengths and weaknesses of existing assessments; and 
challenges related to designing, implementing, and interpreting such assessments. Consistent with extant frameworks, we propose an 
operational definition including 4 strands of skills: (a) social and rhetorical knowledge, (b) domain knowledge and conceptual strate¬ 
gies, (c) language use and conventions, and (d) the writing process. Measuring these aspects of writing requires multiple assessment 
formats (including selected-response [SR] and constructed-response [CR] tasks) to balance construct coverage and test reliability. Next- 
generation assessments should balance authenticity (e.g., realistic writing tasks) and psychometric quality (e.g., desirable measurement 
properties), while providing institutions and faculty with actionable data. The review and operational definition presented here should 
serve as an important resource for institutions that seek to either adopt or design an assessment of students’ writing proficiency. 

Keywords Writing; writing assessment; higher education; written composition; communication; student learning outcomes 
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Effective communication is fundamental to success in many aspects of life. Scholars such as Dewey (1938/1997) have 
acknowledged the importance of language as a primary medium through which learning takes place in educational and 
everyday experiences, asserting that “all human experience is ultimately social... it involves contact and communica¬ 
tion” (p. 38). In order to interact successfully with others in academic, workplace, and community settings, individuals 
must be able to communicate — to convey or exchange information, knowledge, and ideas — clearly and effectively. Young 
learners begin to develop their communication skills in oral contexts, but as they progress through K-12, writing skills 
become increasingly important, shifting in emphasis from the development of foundational print literacy and transcrip¬ 
tion skills, to composing narratives about one’s experiences, to expositions or analyses of phenomena, and ultimately to 
more sophisticated tasks, such as writing arguments or research reports. 

The ability to write effectively using standard written English is particularly important in higher education, where 
proficiency with written communication is considered a critical student learning outcome (SLO). A survey conducted 
by the Association of American Colleges and Universities (AAC&U, 2011) found that 99% of the chief academic officers 
from 433 higher education institutions rated writing as one of the most important intellectual skills for their students. 
More recently, the Educational Testing Service (ETS, 2013a) conducted interviews with provosts or vice presidents of 
academic affairs from more than 200 institutions regarding the most commonly measured general education skills, finding 
that written communication was the most frequently mentioned competency considered by respondents as critical for 
both academic and career success. The focus on written communication is also apparent internationally. Notably, written 
communication is included as a generic skill expected of all students in the Assessment of Higher Education Learning 
Outcomes (AHELO) project, an effort to evaluate general learning outcomes of college students across nations, which is 
sponsored by the Organisation for Economic Co-operation and Development (OECD, 2012). 

Reports from the workforce echo that of higher education. Written communication was among the most desired skills 
mentioned by a sample of 431 employers from various industries surveyed by the Conference Board (Casner-Lotto & 
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Barrington, 2006); over 93% of respondents reported that written communication was “very important” (p. 41) for the 
workplace, yet 28% of respondents rated the writing skills of 4-year college graduates entering the workforce as “deficient” 
(p. 41). Further, 89% of 302 employers surveyed by the AAC&U (2011) said that colleges and universities should place 
more emphasis on communication skills, the highest endorsement of any skill included in the survey. Written communi¬ 
cation skills are crucial for the workplace, yet many employers perceive college graduates as being underprepared for the 
writing tasks required at work. By contrast, college graduates report that learning to write effectively is one of the most 
important skills learned in their undergraduate career (e.g., Krahn & Silzer, 1995). 

These discrepancies in perceptions across stakeholders underscore the need for valid, reliable assessments of written 
communication as a learning outcome that can provide institutions, employers, and individual students with meaningful 
information about students’ skills. Recent calls for assessment reform also reflect the importance of designing assessments 
that have instructional relevance, provide feedback to teachers and students, and can be used to improve curriculum (Gor¬ 
don Commission, 2013). A next-generation assessment of written communication competency at the higher education 
level could be used to inform revisions to curriculum and instruction in the service of developing students’ writing skills, 
to make effective hiring decisions, and/or to provide students with feedback about their preparation for future academic or 
workforce pursuits. Such an assessment should be based on a precise definition of the written communication construct, 
which is supported by and consistent with current empirical research on writing in higher education. 

Although there is general agreement that effective communication skills (both oral and written) are important, there is 
some ambiguity about how this competency should be defined. Markle, Brenneman, Jackson, Burrus, and Robbins (2013) 
reviewed definitions of effective communication from seven key frameworks of general education competencies in higher 
education. Based on this synthesis, the authors defined this competency as the ability to “effectively communicate multiple 
types of messages; communicate across multiple forms; and effectively deliver messages to varying audiences” (p. 16). This 
definition highlights three aspects of communication: the message’s type (i.e., genre), form (i.e., medium), and recipient. 
Understanding these aspects of communication is important in both oral and written modalities. However, these aspects 
alone may not fully delineate the range of skills that specifically constitute proficiency with written communication. 

The overwhelming emphasis on written communication among stakeholders suggests a need to examine existing 
frameworks, focusing on outcomes specific to writing. As with communication in general, definitions of writing skill 
vary across frameworks. Similarly, existing writing assessments vary in the extent to which they are designed to measure 
particular skills. For example, the writing component of the TOEFL® test is designed to measure writing in English as a 
second or foreign language, with particular attention to the integration of reading, writing, and listening skills, and use 
of particular rhetorical forms, such as summary or description (Cumming, Kantor, Powers, Santos, & Taylor, 2000); the 
particular configuration of writing skills assessed in the TOEFL test is consistent, but not completely overlapping, with the 
writing skills that might be targeted in an assessment of writing as an SLO. Thus, despite the apparent consensus on the 
importance of written communication as a critical competency, there are multiple definitions of what constitutes effective 
writing at the college level. For the purposes of designing and building next-generation assessments of written commu¬ 
nication for higher education, a clear construct definition is needed. A primary goal of this article is to provide such a 
definition. A secondary goal is to identify and discuss the issues and challenges that must be considered when designing 
an assessment of written communication as a learning outcome. 

In the first section of this report, we review existing definitions and frameworks of written communication in higher 
education. We also discuss models and theories from the field of writing research that can inform our definition of this 
construct. In the second part of this report, we review current assessments of written communication with respect to 
construct coverage, item formats, and reliability and validity evidence. We then discuss challenges in designing written 
communication assessments, including use of automated scoring techniques, and consider their relevance to curriculum 
and instruction. In the final section of this report, based on a synthesis of the frameworks reviewed, we propose an oper¬ 
ational definition for a next-generation assessment of written communication; this definition is specifically intended to 
support the development of assessments of this particular SLO in higher education contexts. We also provide examples 
of item types designed to assess key writing skills. 

In particular, the review of existing written communication assessments presented in the second part of this report 
is intended to aid higher education institutions in choosing among alternative assessments. Evidence suggests that the 
institutional emphasis on assessment of SLOs continues to increase, with learning institutions turning to a wide variety of 
assessments and approaches to meet demands for accountability (Kuh, Jankowski, Ikenberry, & Kinzie, 2014). Navigating 
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the landscape of available instruments and assessment methods poses a challenge for higher education institutions, so the 
current synthesis is intended to serve as a helpful guide. We also hope that the approach to designing a next-generation 
written communication assessment described here will serve as a resource for institutions in developing their own writing 
assessments. Building collaborative partnerships between higher education institutions and testing organizations in the 
assessment design process can help ensure that SLO assessments meet standards of technical quality while maximizing 
instructional relevance. 


Review of Existing Frameworks and Research 
Definitions of Written Communication in Key Frameworks 

Written communication involves the ability to effectively convey multiple types of messages, in multiple forms, to varying 
audiences, through a written medium (see Markleet al.,2013). However, writing is a multifaceted construct and is defined 
differently among various sources. Notions of what constitutes quality writing vary even among experts (Behizadeh, 2014). 
As emphasized by Murphy and Yancey (2008), arguments for the use of particular techniques for assessing students’ 
writing are often based on competing theories about the nature of the writing construct—as a set of discrete skills, as a 
cognitive (or instructional) process that takes place over time, and more recently, as a meaning-making and highly social 
activity that varies across contexts and purposes for writing (p. 449). Since these various perspectives affect assessment 
design decisions, it is critical to determine consistencies among stakeholders’ views of the underlying construct. 

Table 1 presents definitions of written communication 1 drawn from nine key frameworks, including the Council 
of Writing Program Administrators (CWPA), National Council of Teachers of English (NCTE), and National Writing 
Project (NWP)’s Framework for Success in Postsecondary Writing (2011); the National Institutes of Health (NIH)’s 
definition of communication competency (OHR-NIH, 2014); the Quality Assurance Agency for Higher Education’s 
Framework for Higher Education Qualifications (QAA-FHEQ); AAC&U’s Liberal Education and America’s Promise 
(LEAP) VALUE rubrics (Rhodes, 2010); Lumina’s Degree Qualifications Profile (Adelman, Ewell, Gaston, & Schneider, 
2011); the U.S. Department of Labor’s Employment and Training Administration (US-DOL, 2014) Industry Compe¬ 
tency Model Clearinghouse; European Commission’s European Higher Education Area (EHEA) Competencies (i.e., 
the Bologna Framework; European Higher Education Area, 2005; Gonzalez & Wagenaar, 2003); the Council for the 
Advancement of Standards in Higher Education (CAS)’s Framework for Learning and Development Outcomes (CAS, 
2009); and the Assessment and Teaching of 21st-Century Skills KSAVE frameworks (Binkley et al., 2010). 

Table 2 shows the correspondence between each framework reviewed and different dimensions of the writing construct 
mentioned within the various definitions. Definitions and learning outcomes across the various frameworks show some 
degree of consistency, but, interestingly, the configuration of features thought to underlie skilled writing at the college level 
varies such that no two frameworks define the construct in exactly the same way. Importantly for our present purposes, no 
single assessment of college writing has been designed on the basis of any of these frameworks or on a synthesis of them; 
these frameworks suggest learning and assessment targets but have not directly informed the development of specific 
large-scale assessments. We explore the relationships between existing assessments and aspects of the writing construct 
in the second part of this report. 

Key Dimensions of Written Communication 

Members of the CWPA, NCTE, and NWP (2011) collaborated to develop a framework describing the rhetorical and 21st 
century skills required for success in reading, critical thinking, and writing at the college level. This Framework for Success 
in Postsecondary Writing is intended to describe college readiness, or the expected knowledge, skills, and abilities of a 
student who has completed a first-year composition course in college and who demonstrates readiness to take on more 
advanced intellectual work in further academic or career settings. Specifically, the Framework organizes literacy skills into 
five dimensions: rhetorical knowledge (including understanding of various purposes, audiences, contexts, genres, and 
forms of writing), critical thinking (including analysis of reading materials, evaluating information sources’ usefulness 
and reliability, and using research to support writing), writing processes (including planning, drafting, editing, revising, 
and responding to feedback), knowledge of conventions (including both surface-level grammatical conventions and more 
global concerns related to discourse content, organization, tone, and style), and composing in multiple environments (e.g., 
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Framework Author/Sponsor Written communication (or equivalent) definition 

Framework for Success in Council of Writing Program Rhetorical knowledge: The ability to analyze and act on understandings of audiences, purposes, contexts, genres, 

Postsecondary Writing Administrators, National and forms in creating texts. This includes learning to compose a variety of texts for different disciplines and 

Council of Teachers of English, purposes. Critical thinking: The ability to analyze a situation or text and make thoughtful decisions based on 
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Table 2 Mapping of Written Communication Skills to Key Frameworks 


Dimensions of writing construct 

CWPA, NCTE, & 
WPA Framework 

NIH-OHR 

ATC21S 

DQP 

DOL-ETA 

BOLOGNA 

QAA 

CAS 

LEAP 

Context and purpose 

X 


X 

X 

X 

- 


X 

X 

Audience awareness 

X 

X 


X 

X 

X 

X 


X 

Content development and 
organization 

X 

X 


r * / 

X 

X 

X 

X 

X 

Genre conventions (text 
types/formats) 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Disciplinary conventions 
(major/field) 

X 



X 


X 

X 


X 

Use of sources and textual 

evidence 

X 


” 

X 


X 

” 

“ 

X 

Processes (planning, drafting, 
revision) 

X 

“ 

X 

- 

" 

“ 

- 


“ 

Modes and forms (multimedia, 
digital) 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Word choice, tone, voice, and 
style of language 

X 


X 


X 





Language use, grammar, syntax, 
and mechanics 

X 


X 

X 

X 



X 

X 


Note. X = mentioned as part of framework definition or rubric statements; ~ = indirectly mentioned as part of framework definition 
or rubric statements (i.e., statement that could be related to a dimension); - = not mentioned in the framework. 


using traditional and digital production modes, and incorporating electronic sources in the written work; see Table 1). 
These five dimensions correspond nicely to the aspects of writing in higher education and workforce frameworks, and they 
appear to encompass all of the critical elements of written communication; accordingly, the following review is organized 
around these five dimensions. Importantly, this Framework includes critical engagement with and use of sources and 
emphasis on the writing process, two skills tha t are infrequently mentioned across the set of frameworks we reviewed. 
The Framework also highlights connections among reading, critical thinking, and the development of skilled writing; the 
interconnected nature of these literacy skills is widely acknowledged (e.g., NCTE-WPA, 2010). 

Rhetorical Knowledge of Forms/Modes, Genres, and Disciplines 

The most prevalent dimension across the frameworks we reviewed concerns skill in handling different forms of written 
products, with each of the nine frameworks including some attention to different types of communication. As defined in 
the AAC&U’s LEAP VALUE rubrics, written communication “can involve working with many different writing technolo¬ 
gies, and mixing texts, data, and images” (Rhodes, 2010, p. 1). Accordingly, frameworks emphasize that college graduates 
should be able to proficiently integrate multimedia (e.g., visual aids, charts, graphs, and images) to support comprehension 
of complex written material (Binkley et al., 2010); to “use effective communication channels and methods” (OHR-NIH, 
2014, p. 1) including social media and electronic distribution; and to produce a variety of written forms, including letters, 
essays, e-mails, websites, reports, or presentations. This view of writing incorporates the notion of multiliteracies (Cope 
& Kalantzis, 2000), which emphasizes multilingual and multimodal literacy as critical in the 21st century. Thus, writing 
involves producing text using a variety of communication technologies, media, and dissemination channels. 

With respect to genres of writing, the frameworks place particular emphasis on the genre of argument (men¬ 
tioned in four of nine frameworks), which requires skill in presenting clear, coherent, and effective arguments that 
are convincing to an audience and that consider others’ perspectives (Binkley et al., 2010). The genre of explanation is 
mentioned less often than argument. Explanations are called for explicitly in the DQP in the form of “explications of 
technical issues and processes” (Adelman et al., 2011, p. 14) and are more indirectly referenced in terms of “effectively 
[articulating] abstract ideas” (CAS, 2009, p. 46) in the CAS outcomes. Narrative, more common in K-12 settings, is 
mentioned in only one framework (Adelman et al., 2011). Other genres include directions, manuals, flow charts, and 
interviews. 
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In addition, adherence to disciplinary conventions (i.e., the forms and genres of expression that are valued within a 
major field or discipline) is mentioned in five of nine frameworks. Students are expected to be able to conduct inquiry 
within their discipline and to use correctly types and techniques of writing that are consistent with the values and expecta¬ 
tions of the field. Genre and disciplinary considerations can be treated as part of a student’s rhetorical knowledge (CWPA 
et al., 2011), but could also be considered a part of a students conceptual knowledge of the discipline. 

Rhetorical Knowledge of Context and Purpose 

Attention to the context and purpose of a writing task is mentioned in a majority of frameworks (six of nine). Example 
purposes include advancing an argument to influence others or designing an approach to solve a problem. Writing should 
be appropriate for the purposes of the writing task, including use of appropriate tone and register (e.g., distinguish between 
formal and informal uses of language; write in a professional and courteous manner appropriate for business purposes). 
Context and purpose are closely related to genre and disciplinary considerations, and they are also a part of students’ 
rhetorical knowledge (CWPA et al., 2011). 

Rhetorical Knowledge of Audience 

Audience awareness is directly mentioned in a majority of frameworks (seven of nine for audience and content) and is 
indirectly mentioned in the remainder. Audience design concerns a writer’s attention to the knowledge, interests, and 
values of the recipient of a communication and skill in tailoring writing and expression to suit that audience (e.g., address 
experts and nonexperts in a specific field; address general and specific audiences). Some frameworks only indirectly men¬ 
tion audience awareness as part of the writing construct, in that writers should “convey meaning in a way that others 
understand” and should “write to influence others” (e.g., CAS, 2009, p. 46). These statements imply the notion of an audi¬ 
ence (i.e., others should understand and be influenced by what is written), but they do not explicitly mention the term, 
nor do they indicate what kinds of others the writer might reasonably be expected to address. 

Development and Organization of Content 

Content development and organization is mentioned in seven of nine frameworks and can be defined as “the ways in 
which the text explores and represents its topic in relation to its audience and purpose” (Rhodes, 2010, p. 2). Organization 
involves producing prose that is logical, well structured, and coherent by, for example, moving from general topics to more 
specific ideas and details (CAS, 2009). Content development refers to the extent to which the writer effectively articulates 
abstract ideas and uses adequate supporting details (US-DOL ETA, 2014). When students are engaged in writing about 
something, their skill in developing and organizing that content in a coherent manner is critical for the communication 
to be successful. 

Adherence to Language Conventions 

Attention to language conventions is mentioned in six of nine frameworks. Statements relating to conventions (including 
syntax, grammar, and usage) underscore the idea that, by college, students should be fluent with text production skills 
and be able to compose “substantially error free prose” (Adelman et al., 2011, p. 14) with appropriate syntax and mechan¬ 
ics, spelling, grammar, and so forth. This includes knowledge of vocabulary, stylistic conventions, and the functions of 
language, both at surface and global levels (CWPA et al., 2011). 

Writing From Sources and the Writing Process 

The frameworks reviewed give relatively little attention to two features of written communication emphasized by the 
higher education writing community: (a) critical analysis and use of sources and (b) attention to the writing process. 
Using sources to support writing is included as a major dimension of the LEAP rubrics, suggesting attention to evaluating 
the relevance, quality, and credibility of those sources 2 (Rhodes, 2010); in contrast, the Bologna framework (Gonzalez 
& Wagenaar, 2003) suggests that students should “receive and respond to a variety of information sources” (p. 144) in 
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visual, oral (i.e., auditory), and textual formats, while the DQP states that students will be able to conduct inquiry from 
non-English language sources 3 (Adelman et al., 2011). The Framework for Success in Postsecondary Writing (CWPA et al., 
2011) deals with writing from sources as an aspect of critical thinking and analysis of text materials, a process of conducting 
research from sources, knowledge of source attribution conventions, and incorporating electronic sources in multimedia 
productions; including elements related to use of sources in four of five dimensions of writing skill suggests that source 
use is particularly important for higher education. 

With respect to the process dimension, skill in monitoring the writing process “from drafting to proof-reading” (Bink¬ 
ley et al., 2010, p. 22) is an important aspect of writing in ATC21S and is a major dimension of the Framework for Success 
in Postsecondary Writing (CWPA et al., 2011), but no other frameworks address this issue. In fact, the LEAP VALUE 
rubrics (Rhodes, 2010) specifically exclude notions of writing processes or strategies from their framework for student 
learning outcomes. However, as underscored by the Framework (CWPA et al., 2011), these strategies and processes are 
a critical aspect of writing at the college level and, thus, should be included in any comprehensive definition of written 
communication. 

To summarize, based on the review of frameworks presented here, it is clear that in defining written communication, 
we must consider facility with multiple types (i.e., genres), forms (i.e., media), and audiences, in addition to the impor¬ 
tance of the context and purpose for writing, and the importance of skill in manipulating both conceptual content (i.e., 
development and organization of ideas, critical analysis, and use of sources) and linguistic information (language, syn¬ 
tax, and mechanics; tone of voice and register) to suit the current communicative goals. The writing process (planning, 
drafting, and revision) is also critical. 

Theoretical Perspectives on Writing From the Research Literature 

It is evident from the frameworks reviewed in the previous section that writing is a complex skill, involving multiple 
dimensions, and that different perspectives on writing may differentially emphasize some of those dimensions. In this 
section, we explore theoretical perspectives that underlie these various dimensions of written communication compe¬ 
tency and the importance of these dimensions for becoming a skilled writer. This survey of extant research literature is 
intended to enrich our definition of the construct and to suggest which dimensions might be more or less critical for 
higher education. Consistent with previous efforts to summarize the writing construct (Cumming et al., 2000), the work 
surveyed here suggests that writing involves cognitive processes situated within particular rhetorical or social contexts. 

Sociocognitive Perspectives on Writing 

Both social and cognitive perspectives on writing converge on the notion that writing is, by definition, social and purpose- 
driven (e.g., Bereiter & Scardamalia, 1987; Graham & Perin, 2007; Zimmerman & Risemberg, 1997; also see Deane, 

2011) . Genres of writing, for example, serve specific social goals and purposes (Bazerman, 2004), and those rhetorical 
goals shape and constrain the types and methods through which information should be recorded and shared with others 
when writing within a particular genre. In higher education, the focus is typically on transactional writing (i.e., writ¬ 
ing to communicate or exchange information, ideas, or arguments with others in order to achieve particular purposes, 
such as to inform, persuade, or explain information to others; Burstein et al., 2014). Therefore, writers must consider 
the nature and needs of their audience(s) in order for communication to be successful (cf. Clark & Murphy, 1982). 
Sociocultural perspectives also emphasize that cultural conventions and social situations impact literacy practices (Perry, 

2012) , such that attention to the social context for writing is critical for both assessment and learning (cf. Behizadeh 
& Engelhard, 2011). This is consistent with work in the learning sciences suggesting that learning to write is best con¬ 
ceptualized as a process of socialization into a literate community of practice, whereby writers are guided by expert 
practitioners to gradually take increasing responsibility for producing the forms and genres of writing that are valued 
within a discipline or research community (Barab & Duffy, 2000; Bereiter & Scardamalia, 1987; Brown, Collins, & Duguid, 
1989; Lave & Wenger, 1991). Instruction should strive to make writing socially meaningful to students (Alvermann, 
2002 ). 

This sociocognitive perspective has been applied to support the design of competency models and assessments. As an 
example, a model of ELA literacy incorporating social, conceptual, and linguistic dimensions has been developed under 
the Cognitively Based Assessment of for, and as Learning ( CBAL ™) research initiative at ETS (Bennett, 2010). The CBAL 
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ELA competency model (Deane, 2011; Deane, Sabatini, & O’Reilly, 2011; Sabatini, O’Reilly, & Deane, 2013) specifies the 
reading, writing, and critical thinking skills that are necessary to learn in order to participate in key literacy practices (e.g., 
learning from informational text, engaging in argumentation, conducting inquiry and research). It is possible to conceptu¬ 
alize the various levels of knowledge and skill required for participation in literate activities as dealing with different types 
of knowledge representations (i.e., social, conceptual, and linguistic — including discourse, verbal, and print 4 — levels; 
Deane, 2011). Broadly, expert writing can be considered to involve a set of receptive skills (processing and comprehend¬ 
ing information from source texts), expressive skills (synthesizing information from source texts and translating one’s 
ideas into written words), and deliberative skills (applying appropriate strategic and meta-cognitive knowledge), which 
rely on the social, conceptual, and linguistic representations. A written product is the result of interactions among complex 
cognitive processes, as well as the knowledge of and skill in adapting one’s production to meet the social and rhetorical 
constraints on what kind of writing must be produced to achieve one’s purposes (Behizadeh & Engelhard, 2011). Thus, 
writing can be appropriately conceptualized as a set of sociocognitive practices (Behizadeh, 2014; Deane, 2011; Deane 
et al., in press), which experts can deploy strategically to achieve particular goals. 

The CBAL ELA model specifies how the skills that support participation in various literate practices may develop, from 
novice to expert-level performances, by positing a set of hypothesized learning progressions (LPs) for the skills that con¬ 
stitute and contribute to performance of those key practices. These LPs can be used to support the design of assessments 
that target specific component skills (e.g., distinguishing between primary and secondary sources, making cross-text syn¬ 
thesizing inferences), while modeling the integrated practices required of professionals (e.g., writing a research report). At 
the most advanced levels of practice, writers are expected to take into account their purpose, audience, and disciplinary 
knowledge; in conducting research and inquiry, for example, writers are expected to present and support an original syn¬ 
thesis, review and evaluate evidence from relevant literatures (including seminal sources within the discipline), and to 
articulate how one’s work contributes to and extends current knowledge and discourse about the issue (Sparks & Deane, 
2014). These types of performances may not yet be achieved by the time students enter college, but are consistent with 
those expected in advanced undergraduate, graduate study, or professional practice. For more information on this effort, 
see http://elalp.cbalwiki.ets.org/. 

Cognitive Processes of Writing 

As described above, skilled writing requires the deployment and coordination of complex cognitive processes. A prevailing 
cognitive model, the Hayes-Flower model of writing (Hayes & Flower, 1980), specifies writing as consisting of interactions 
between the task environment (i.e., features of the writing assignment, such as the topic, audience, and context or pur¬ 
pose, and any text one has produced so far), the writer’s long-term memory (i.e., knowledge of the topic, knowledge of the 
audience, and general plans for writing), and the writing process (i.e., planning, translating, and reviewing). Each aspect 
of the writing process is goal-directed and requires self-regulation. In the planning process, the writer retrieves relevant 
knowledge from long-term memory, evaluates the usefulness of the retrieved information, selects the most useful infor¬ 
mation, and organizes the information into a writing plan. Then the writer translates all these operations into sentences 
that can be understood by others. In the reviewing process, the writer reads or rereads the existing text and revises it when 
writing goals have not been satisfied (e.g., “I should address this counterargument to persuade the audience” or “I need to 
explain this complex idea in simple words”). These processes are recursive and interactive, as planning, translating, and 
reviewing can be triggered by one’s goals. 

Expert writers demonstrate qualitatively different writing processes compared to novices. Hayes and Flower (1980) 
found that skilled writers typically established their main writing goals and subgoals early in the writing process, while 
unskillful writers spent little time planning. Attention to one’s goals for revision similarly explains observed differences in 
revising behaviors between expert and novice writers (Fitzgerald, 1987). First, expert writers typically spend substantial 
time and effort in revising their draffs (e.g., Holland, Rose, Dean, & Dory, 1985), but novice writers ignore the revision pro¬ 
cess or have little idea about how to do it well (Graves & Murray, 1980). Second, expert writers revise their work to improve 
its overall quality and to clarify the ideas that they want to convey to their audience (Hayes & Flower, 1986), while novice 
writers view revision as a task to correct grammar, spelling, diction, and punctuation (Faigley & Witte, 1981; MacArthur, 
Schwartz, & Graham, 1991; Sommers, 1980). Novice writers have an impoverished understanding of the revision process, 
resulting in revisions that are irrelevant to the meaning of the text, unconnected to genre considerations, and insufficient 
to help improve the quality of writing. While the ability to revise develops over time (Fitzgerald & Markman, 1987), many 
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college students cannot perform this task adequately (Kinsler, 1990), suggesting that revision skill may differentiate more 
expert from less skilled writers. 

Knowledge-Telling Versus Knowledge-Transformation 

Another key difference between the writing practices of experts and novice students is in their approach to and conceptu¬ 
alization of the writing task with respect to content development and organization. In complex writing situations, where 
a writer must maintain and work toward achieving multiple goals, it is challenging for novice writers to handle all of the 
writing constraints without any support. Therefore, novices tend to approach the writing task as simply telling what is 
known about the topic (Bereiter & Scardamalia, 1987). In fact, many students, including some in college, compose using 
this knowledge-telling approach, because knowledge-telling may help reduce the burden of other cognitive processes, such 
as planning and revising, which makes the task of producing text manageable. However, students using this approach often 
overlook their rhetorical goals, the needs of the audience, the organization of the text, and the writing genre (Bereiter & 
Scardamalia, 1987; Graham & Harris, 1997), reflecting a lack of goal setting and self-regulation. In contrast, expert writ¬ 
ers and domain experts are likely to use a knowledge-transforming approach, which involves viewing the writing task as 
a problem-solving process. Writers who adopt this approach do not only deal with knowledge and beliefs related to the 
topic, but also consider the rhetorical goals of the composition; experts make decisions about how to represent this knowl¬ 
edge best in terms of the appropriate language for the intended audience, which is directly reflected in the structure of the 
text (Bereiter & Scardamalia, 1987). 

Reading and Writing From Sources 

Reading and comprehending source texts gives writers content knowledge about which they can write (e.g., Hayes, 1996; 
see also Hillocks, 1987,2005). Expert researchers across multiple domains rely on synthesis of multiple sources to situate 
their ideas within a particular literature and to build support for their knowledge claims (e.g., Bazerman, 1985; Goldman, 
2004; Goldman et al., 2010; Latour & Woolgar, 1986). Writers of arguments, reports, and other research-based genres 
of writing must interpret sources, determine what information is relevant to their task and purpose, and decide what 
quotations or paraphrases to embed in the text to support their ideas. These reading-writing connections, including the 
importance of critically analyzing and using source texts to support one’s writing, are emphasized in the Framework for 
Success in Postsecondary Writing, as described previously. However, according to reports from the National Adult Lit¬ 
eracy Survey (Kutner, Greenberg, & Baer, 2006), fewer than one third of college graduates surveyed were proficient in 
comprehending prose (extended texts, such as newspaper articles) and other documents (practical directions, such as a 
prescription medicine label), suggesting that many college students’ writing difficulties may be due to failures of reading 
comprehension. 

Even students who read proficiently may have difficulty writing syntheses or arguments because they fail (and per¬ 
haps do not know how) to evaluate or to cite sources appropriately. Empirical research demonstrates that attention to 
sources (i.e., author expertise, publication venue, possible biases) supports understanding and integration of information 
from multiple documents (e.g., Braten, Stromso, & Britt, 2009; Britt & Aglinskas, 2002; Sparks, 2013; Wineburg, 1991), 
suggesting that students who are more attentive to the characteristics of source documents are better equipped to write 
essays or reports based on those sources. Unfortunately, empirical research generally suggests that undergraduates fail to 
attend to source information unless given specific instructions or tasks to consider it critically (e.g., Britt & Aglinskas, 
2002; Rouet, Britt, Mason, & Perfetti, 1996; Sparks & Rapp, 2011; Wiley et al., 2009). These difficulties with sourcing likely 
contribute to several common issues observed in undergraduates’ source-based essays, including plagiarism, inclusion of 
quotations without source attribution, excessive use of quotations (i.e., quote pastiche), little use of explicit citations (e.g., 
“according to Carnegie,... ”), and little evidence of synthesis across sources (Britt, Wiemer-Hastings, Larson, & Perfetti, 
2004). In one study, Britt et al. (2004) asked 108 undergraduates to write opinion essays from a set of seven sources on a 
history topic, finding that “only 28% of the essays included at least one explicit reference. Considering that no participants 
made more than two explicit references, it appears that undergraduates are not fully proficient at sourcing” (p. 2). Students 
tended to cite one to two key sources rather than incorporating content and ideas from across a variety of documents. 
However, findings from experimental tasks that require students to write and cite sources from memory may not fully 
generalize to situations where students write essays with source texts and notes available to them, such as in classrooms or 
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assessment situations. It remains an open question whether these contexts might encourage additional attention to critical 
analysis and incorporation of sources. 

Given the preceding theoretical discussion, it is worth considering the extent to which these research perspectives on 
written communication correspond to the instructional goals and outcomes observed in educational settings, both within 
higher education and in K-12, where college readiness is a particular concern. The following section outlines the writing 
skills that are important for success in college writing. 

Writing Instruction and Learning to Write in College 

Writing for College Readiness: Connections to Common Core State Standards 

In developing a framework for written communication at the college level, it is critical to have expectations concerning 
incoming students’ knowledge and skills. To understand what writing skills are expected of someone who is ready to take 
on college-level work, one can consider the upper levels of the Common Core State Standards (CCSS) for ELA/literacy 
(National Governors Association & Council of Chief State School Officers, 2010), with a particular focus on the writing 
standards for Grades 11-12. As these standards define the highest levels of K-12 performance, they are equivalent to the 
incoming skills expected of a first-year undergraduate who demonstrates readiness for college-level writing literacy. 

As seen in Table 3, students who meet the expectations of the CCSS college and career readiness standards can compre¬ 
hend and evaluate a variety of different texts and documents; construct effective arguments and explications of complex 
or multifaceted information; build and share their knowledge with others through writing; tailor communications to 
particular audiences, tasks, purposes, genres, and disciplines; select and use evidence that is appropriate for the disci¬ 
pline (e.g., history, science); conduct research and inquiry from multiple sources, evaluating their reliability and cred¬ 
ibility; evaluate sources for their use of evidence; and cite specific textual evidence to support claims and explanations 
in one’s writing. While it is certainly the case that many students will enter into higher education settings with these 
skills being less than fully developed, it is important to note that the standards correspond to many of the major dimen¬ 
sions of writing that emerged from the review of frameworks above, including attention to one’s task, purpose, and 
audience; writing in the genres of argument and explanation; developing and organizing one’s ideas coherently; pro¬ 
ficiency with the writing and revision process; conducting research; and engaging in close reading and synthesis of 
sources. 

Writing Instruction in the College Classroom 

Undergraduates’ experience with writing instruction varies with, historically, the bulk of this instruction occurring in 
first-year composition courses with little continuing writing instruction when students move on from general education 
courses to more specialized work within their major discipline. Since the 1980s, however, Writing Across the Curriculum 
(WAC) programs have emerged, emphasizing “active student engagement with the material and with the genres of the 
discipline through writing, not just in English classes, but in all classes across the university” (McLeod, 2012, p. 54). WAC 
views writing as a skill that must be continuously integrated into curricula, so that students can learn to communicate 
effectively within the constraints and values of their discipline through exposure to and practice of the conventions and 
genres that are valued for success in that discipline (i.e., writing in the disciplines). Importantly, writing is viewed not just as 
a way to demonstrate learning, but also as a method of learning and of refining one’s thinking (i.e., writing to learn; WAC 
Clearinghouse, 2014). Writing to learn emphasizes the reflective and sense-making functions of writing, which can help 
the writer to organize and represent his or her thoughts coherently. Writing in the disciplines asks students to produce 
genres and forms of products that are used routinely by working professionals within the field (e.g., lab reports, position 
papers, literature reviews, journal articles, and project or grant proposals), consistent with sociocognitive perspectives 
described previously. 

Despite its popularity, WAC can pose significant challenges for students. As described by Haswell (2008), “academic 
fields differ in the way they regulate every aspect of writing, from usage as minute as the function of the colon in titles 
to usage as pervasive as the way evidence is respected, gathered, and presented” (p. 416). For undergraduates, learning to 
write in the disciplines exposes them to “unfamiliar composing processes, novel genres and tasks, shifting standards and 
expectations” (p. 416), often resulting in discrepant feedback across courses (e.g., Anson, Schwiebert, & Williamson, 1993). 
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Table 3 Common Core Standards for Writing, Grades 11-12 


CCSS standard Description 


W.ll-12.1 


W.l 1-12.2 


W.l 1-12.3 


W.ll-12.4 


W.l 1-12.5 


W.l 1-12.6 


W.ll-12.7 


W.ll-12.8 


W.l 1-12.9 
W.ll-12.10 


Write arguments to support claims in an analysis of substantive topics or texts, using valid reasoning and 
relevant and sufficient evidence. 

Write informative/explanatory texts to examine and convey complex ideas, concepts, and information clearly 
and accurately through the effective selection, organization, and analysis of content. 

Write narratives to develop real or imagined experiences or events using effective technique, well-chosen 
details, and well-structured event sequences. 

Produce clear and coherent writing in which the development, organization, and style are appropriate to task, 
purpose, and audience. (Follow standards 1-3) 

Develop and strengthen writing as needed by planning, revising, editing, rewriting, or trying a new approach, 
focusing on addressing what is most significant for a specific purpose and audience. (Editing for 
conventions should demonstrate command of language standards 1-3 through grades 11-12). 

Use technology, including the Internet, to produce, publish, and update individual or shared writing products 
in response to ongoing feedback, including new arguments or information. 

Conduct short as well as more sustained research projects to answer a question (including a self-generated 
question) or solve a problem; narrow or broaden the inquiry when appropriate; synthesize multiple sources 
on the subject, demonstrating understanding of the subject under investigation. 

Gather relevant information from multiple authoritative print and digital sources, using advanced searches 
effectively; assess the strengths and limitations of each source in terms of the task, purpose, and audience; 
integrate information into the text selectively to maintain the flow of ideas, avoiding plagiarism and 
overreliance on any one source and following a standard format for citation. 

Draw evidence from literary or informational texts to support analysis, reflection, and research. 

Write routinely over extended time frames (time for research, reflection, and revision) and shorter time frames 
(a single sitting or a day or two) for a range of tasks, purposes, and audiences. 


The ability to understand and adapt one’s writing to the current social and situational context (including disciplinary 
considerations) is an important part of the development of skilled writing (Carroll, 2002). After having mastered fluency 
with text production skills, college writers learn to produce different text structures, for different audiences, with various 
goals or purposes. However, development of these skills is uneven and dependent on students’ experiences with various 
instructional strategies. The most common strategies for teaching college writing, as observed across a sample of more than 
2,300 teacher intervention studies, include audience awareness, coauthoring and peer discussion, journaling, planning 
and prewriting (e.g., outlining, concept mapping), editing and proofreading, detecting and correcting errors, drafting or 
revising, and grammar instruction (Haswell, 2008). As these strategies are most commonly taught, one might predict that 
writing skills associated with those strategies would be among the most likely candidates for improvement during college 
and, therefore, could be considered potential targets for assessment. 

What Skills Can Be Expected to Develop in College Writers? 

Evidence from cross-sectional comparisons of first-year and senior students’ writing reveals that advanced under¬ 
graduates show the largest gains with respect to vocabulary development, organization, reasoning and argumentation, 
use of composition strategies, and use of longer sentences with more complex syntactic structures (e.g.. Flowers, 
Osterlind, Pascarella, & Pierson, 2001; Haswell, 1991; Hunt, 1970). However, there are clear limitations to drawing 
inferences about student improvement from cross-sectional data. Oppenheimer, Zaromb, Pomerantz, Williams, and 
Park (2014) conducted both cross-sectional and longitudinal analyses of growth in undergraduates’ writing perfor¬ 
mance in response to persuasive (e.g., convince an audience about the importance of an issue; 20 minutes) and/or 
expository (e.g., explain a game or hobby so that someone could read the instructions and participate in the activ¬ 
ity; 15 minutes) writing prompts. Writing samples were scored by trained raters on a 4-point scale and submitted 
to cross-sectional and longitudinal analysis. Cross-sectional results revealed average gains from first-year to fourth- 
year students of 0.33 points for persuasive and 0.25 points for expository writing. Longitudinal comparisons of first 
to third year, or second to fourth-year growth showed similar patterns, with essay scores improving by an average 
of 0.30 points. Evidence of growth was stronger for higher performing students, with approximately half of these 
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students showing some improvement, but this study did not indicate what specific aspects of writing might improve 
over time. 

Haswell (2000) reported results of a longitudinal study of the specific features of writing that can be observed to 
develop from the first to third year of college, using a random sample of 64 students’ responses to an impromptu 
persuasive writing prompt. Haswell found significant improvement in several areas, including mean holistic rating 
(8-point scale, human scored, consistent with Oppenheimer et al., 2014); mean sentence, clause, and overall essay 
length (related to fluency and content development); proportion of words in introductory paragraphs, proportion of 
words in free modifiers (i.e., independent phrases or clauses, which can be moved to sentence-initial, mid-sentence, or 
sentence-final positions); and vocabulary (use of words with nine or more letters; p. 331). In sum, students can improve 
the cohesion, elaboration, logic, and overall quality of their persuasive and expository writing through instruction, with 
these improvements in the direction of the skills expected in professional practice (Haswell, 1986, 2000). The extent to 
which skill in the writing process and use of sources can be expected to develop during college is yet unclear, but because 
these dimensions differentiate expert from more novice writers, one could predict that college may help students develop 
these skills, to the extent that students receive appropriate instructional support. In the next section, we review existing 
assessments of written communication and the design challenges and considerations associated with developing such an 
assessment. 


Review of Existing Assessments and Design Challenges 
Existing Assessments of Written Communication 

In support of the goal of developing an operational definition of written communication, we reviewed a variety of existing 
writing assessments designed to be administered to students approaching the entry or exit point of their college educa¬ 
tion. Specifically, we reviewed the assessments with a goal of understanding the advantages and disadvantages of existing 
approaches to assessing writing, and to inform both our notions of the construct and our recommendations for nego¬ 
tiating particular challenges in designing such an assessment. Key features of the assessments reviewed are described 
in terms of the assessment purpose, format, construct coverage, and reliability and validity evidence. Detailed infor¬ 
mation on each assessment appears in Table 4. Table 5 shows the correspondence between the targeted skills (i.e., as 
described in rubric statements and definitions) and the dimensions of the writing construct outlined in the section of this 
report. 

Purpose of the Assessments 

Assessments are created for various purposes, and these purposes affect how the assessment is designed, used, and inter¬ 
preted. Because our goal is to support the development of STO assessment of written communication in higher education, 
we examined assessments designed for a range of purposes and use cases, including placement into developmental or 
college-level English courses (e.g., ASSET, COMPASS, ACCUPLACER, English Placement Test, AP and CLEP tests), 
admission to graduate or professional programs (e.g., GRE Analytical Writing, GMAT Analytical Writing), assessment 
of student learning outcomes (e.g., Collegiate Assessment of Academic Progress, Collegiate Learning Assessment, ETS 
Proficiency Profile; ETS, 2010a; Liu, 2008), or multiple purposes (e.g., College BASE is used for both placement and SLO 
assessment). 

Assessment Format and Construct Coverage 

Large-scale writing assessment in the United States typically takes the form of selected-response (SR) tests, extended 
constructed-response (CR) tests (i.e., composing an essay response to a prompt), or writing portfolios, which consist of 
multiple examples of student writing across contexts, genres, and purposes, collected over time (Yancey, 1999). Portfolio 
assessment could be an effective complement for higher education institutions wishing to get a more detailed view of 
students’ writing performance, particularly across genres, disciplines, and modes of expression, supported by interactions 
with and feedback from instructors (cf. Behizadeh, 2014; see Hamp-Lyons & Condon, 2000, for an in-depth review of 
portfolio assessment). Here, however, we focus on tests with SR and CR item formats. 
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other subjects (e.g., "What could 
be done to make students more 
interested in learning about 
science? Discuss."). 
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Table 5 Correspondence Among Existing Assessments and Dimensions of Written Communication 


Assessment 

Task, 

context 

and 

purpose 

Audience 

aware¬ 

ness 

Genre 

conven¬ 

tions 

Modes 

and 

forms 

Deve¬ 

lopment 

and 

organization 

Use 

of 

sources 

Disciplinary 

conven¬ 

tions 

Style, 

word 

choice, 

tone 

Language 
use and 

conven¬ 

tions 

Writing 

process 

Selected-response assessments a 











ACCUPLACER 

X 

— 

— 

— 

— 

— 

— 

X 

X 

~ 

ASSET Writing Skills Test 

X 

— 

— 

— 

X 

— 

— 

X 

X 

~ 

CAAP Writing Skills 

X 

— 

— 

— 

X 

~ 

— 

X 

X 

~ 

College BASE 

X 

X 

X 

— 

X 

X 

— 

X 

X 

X 

COMPASS Writing Skills 

X 

— 

— 

— 

X 

— 

— 

X 

X 

— 

EPT-CSU 

— 

— 

— 

— 

X 

— 

— 

~ 

X 

— 

EPP 

— 

— 

— 


— 

— 

— 

X 

X 

~ 

Constructed-response assessments a 











ACT Writing Test (Essay) 

~ 

— 

~ 

— 

X 

— 

— 

— 

X 

— 

ACCUPLACER WritePlacer (Essay) 

~ 

— 

~ 

— 

X 

— 

— 

X 

X 

— 

AWPE (Essay) 

~ 

— 

~ 

— 

X 

~ 

— 

~ 

X 

— 

CAAP Essay 

- 

X 

X 

— 

X 

— 

— 

— 

X 

— 

CATW (Essay) 

X 

— 

X 

— 

X 

X 

— 

— 

X 

— 

CLA+ (Essay) 

X 

— 

X 

— 

X 

X 

— 

X 

X 

— 

CLEP: College Composition (Essay) 

X 

— 

X 

— 

X 

X 

— 

— 

X 

— 

College BASE (Essay) 

X 

X 

~ 

— 

X 

— 

— 

X 

X 

— 

COMPASS E-Write (Essay) 

X 

X 


— 

X 

— 

— 

X 

X 

— 

EPP (Essay) 

X 

— 

~ 

— 

X 

— 

— 

— 

X 

— 

EPT-CSU (Essay) 

X 

— 

X 

— 

X 

X 

— 

— 

X 

— 

Georgia Regents Essay Test 

X 

— 

~ 

— 

X 

— 

— 

— 

X 

— 

GMAT Writing 

X 

— 

X 

— 

X 


— 

X 

X 

— 

GRE-R Analytical Writing 

X 

— 

~ 

— 

X 

~ 

— 

X 

X 

— 

MCAT Writing 

X 

— 

X 

— 

X 

— 

— 

— 

X 

— 

TOEFL Writing 

X 

— 

X 

— 

X 

~ 

— 


X 

— 

WSU Writing Placement Exam 

X 

— 

X 

— 

X 


— 

— 

X 

— 


Note. X = assessment provides evidence of this aspect; ~ = assessment provides partial evidence of this aspect; — = assessment does not provide evidence 
of this aspect. 

a See Table 4 for full names of tests. 


Selected-Response Format (Indirect Writing Assessment) 

Eight assessments we reviewed included an SR section; of these, assessments designed for placement purposes were most 
prevalent (e.g., ASSET, ACCUPLACER, and EPT), but SLO assessments such as the EPP and CAAP include SR items as 
well. 5 Most common SR item types include (a) revision-in-context items, in which a section of a sentence or passage is 
underlined, and examinees can either select the most appropriate revision to correct an error in grammar, usage, or syntax 
or indicate that no revision is needed and (b) construction-shift items, which present an alternate beginning to a stimulus 
sentence and require examinees to select the best continuation of that stem from the options provided. ACCUPLACER, 
EPT, and EPP include both of these item types; CAAP includes revision-in-context items presented within a passage. Other 
SR item types ask examinees to select the best sentence to fill a blank in a paragraph (in initial, middle, or final-sentence 
position; e.g., EPT) or to answer questions about the writer’s rhetorical or stylistic goals (e.g., CAAP). 

From a measurement perspective, SR assessments have some advantages, as they tend to be more cost-effective in terms 
of administration and scoring than CR items, and they are considered more objective (i.e., with specific and distinct correct 
responses versus open-ended items that may not have a single correct answer). SR items are often faster to complete, 
meaning that examinees can respond to more of these items in the allotted time compared to the number of CR items 
(e.g., for the CAAP, test takers are asked to respond to 72 SR items, as compared to two CR essays, in 40-minute sessions) 
and, therefore, an SR test typically has a higher reliability than a CR test taking the same amount of time. 6 The SR items may 
also demonstrate better prediction of criterion scores (i.e., scores on a series of short essay tasks) than a single holistically 
scored CR essay (e.g., Godshalk, Swineford, & Coffman, 1966). 

However, with respect to construct representation, use of SR items has some clear limitations. SR items have been 
said to “fail to address the cognitive and reflective processes involved in creating a text — such as making plans for 
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writing, generating and developing ideas, and making claims and providing evidence” (Murphy & Yancey, 2008, p. 
450; see also Odell, 1981), suggesting that SR items underrepresent the writing construct. Consistent with this notion, 
the SR items we reviewed overwhelmingly assessed lower level writing abilities, such as language conventions (gram¬ 
mar, usage, mechanics), style (i.e., word choice, sentence variety, and register), and organization (text structure and 
sequence of ideas), at the expense of higher order writing skills. SR item types such as revision-in-sentence-context 
and construction-shift items are typically used to assess students’ knowledge of local organization, style, and lan¬ 
guage conventions while only indirectly assessing skill in the revision process. CAAP Writing, for example, provides 
students with subscores for usage/mechanics and rhetorical skills (i.e., strategy, purpose, organization, and style). 
Revision-in-passage-context items assess skill in usage, mechanics, or style, targeting a specific text section, while 
strategy and organization items might ask about the passage as a whole; a typical rhetorical strategy question might 
ask examinees to evaluate the appropriateness of a quotation in the passage, given a particular communicative goal. 
This example item indirectly addresses use of sources but is considered mainly in terms of attention to rhetorical 
purpose. The College BASE SR writing test also includes items assessing skill in selecting appropriate prewriting strate¬ 
gies, text structure and organization, choosing sources for a particular purpose, and revision; these items contribute 
to a subscore for understanding the writing process. While the College BASE had the widest construct definition of 
any SR assessment we reviewed, given the limited number of items on the test (i.e., 16-18), it is unlikely that the 
intended construct can be adequately covered such that it provides useful information about rhetorical or conceptual 
skills. 

Thus, while SR items can be used to evaluate linguistic as well as more rhetorical dimensions, the emphasis on rhetorical 
skills — and the extent to which they can be measured reliably—may vary with particular test designs. For example, stu¬ 
dents who demonstrate proficiency with SR revision items can be said to possess the abilities to manipulate sentences, to 
correct errors in diction and syntax, and to recognize inappropriate relations among clauses (e.g., ACCUPLACER). Given 
the review presented in the first section of this article, it is clear that the writing skills that can be effectively assessed by 
asking students to correct errors within single sentences are largely limited to those related to language conventions (i.e., 
grammar, usage, syntax, and mechanics). This tendency of SR items to focus on low-level mechanics and usage in lieu 
of higher order cognitive skills is a primary objection to the use of SR tests to measure writing proficiency (cf. Murphy 
& Yancey, 2008). Arguably, revision-in-passage-context items can assess discourse-level, rather than sentence-level, pro¬ 
cessing, which is more consistent with the kinds of literacy practices that are expected of college writers who deal more 
often in extended text and discourse than with discrete sentences presented in an isolated fashion. However, in general, SR 
assessments still only estimate students’ probable writing ability, by testing discrete knowledge and skills that are associated 
with writing, rather than evaluating students’ ability to produce coherent, error-free writing, as in CR assessments. 

Constructed-Response Format (Direct Writing Assessment) 

The majority of writing assessments we reviewed included CR items, which directly assess students’ writing skills. Typ¬ 
ically, CR assessments require examinees to compose an essay in response to a prompt or stimulus under controlled 
conditions; the texts produced are then evaluated, whether by human raters, automated writing evaluation systems, or 
some combination of the two. The CAAP essay, COMPASS e-Write, CLA+ Performance Task, CUNY CATW, and the 
GRE Analytical Writing are all examples of CR tests. Many such assessments ask examinees to take a position and present 
a well-developed argument using supporting evidence from one’s own readings and experiences (e.g., CAAP essay, GRE 
issue task) or to critically analyze arguments or information presented in a text (e.g., CTA+, GRE argument task). 

Use of CR format items is consistent with the widely held perspective that the most valid measures of writing ability 
are those that actually require students to write extended text (cf. Fowles, 2012; Yancey, 1999). In contrast to SR tests, 
CR items are more authentic, in that they treat writing as an active, social, communicative process (Murphy & Yancey, 
2008). That is, CR tasks require examinees to deploy and demonstrate proficiency with the social, cognitive, and linguistic 
processes that are necessary to solve the rhetorical problem posed by the prompt (cf. Bereiter & Scardamalia, 1987). Under 
timed conditions, the writer’s fluency with these processes becomes particularly important, as he or she needs to be able to 
produce clear and effective text with a logical and coherent organization and structure, despite limited time for planning 
and revision (Hayes & Flower, 1980). The greater one’s fluency with low-level language processes, the more one can use 
available cognitive and conceptual resources to develop and organize ideas, to engage with the intended audience, and 
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to address the rhetorical goals of the piece. Fluent writers’ essays are also less likely to be marked by errors in syntax, 
mechanics, grammar, and word choice. 

However, the extent to which examinees are expected to engage in higher level social and rhetorical problem solving 
in a given CR assessment depends on the assigned prompt. Most assessments we reviewed assessed students’ response 
to the assigned task and genre of writing requested (i.e., argument); organization and content development; word choice 
and style; and adherence to conventions and control of grammar, usage, and mechanics. For example, CR prompts that 
ask examinees to take a position on an issue, to support that position with reasons and examples, and to anticipate coun¬ 
terarguments, while addressing the response to a specific audience (e.g., CAAP, COMPASS e-Write), provide evidence of 
students’ skill in several aspects of writing: adapting writing to purpose and audience, adherence to genre conventions for 
argument structure and quality, development and organization of ideas, and facility with both stylistic and grammatical 
language conventions. It is notable that while any CR assessment could potentially evaluate audience awareness by asking 
writers to address a specified audience, only three assessments we reviewed included this task requirement (i.e., CAAP, 
COMPASS e-write, and College BASE). In the majority of CR assessments, then, aside from the raters that score the essay, 
examinees are writing arguments to no one in particular, which does not truly count as an instance of written communi¬ 
cation (Condon, 2013); this lack of authentic social features has led some researchers to claim that CR tests largely ignore 
social and cultural elements, using “one narrow version of literacy to represent a broad construct” (Behizadeh, 2014, p. 
128). Given the importance of audience awareness in advanced writing proficiency, it is important to assess this aspect of 
writing; yet, many current CR assessments fail to do so. 

CR assessments can also be used to evaluate students’ use of sources in writing. For example, the CLA+ performance 
task presents examinees with a document library, which they can consult and use as evidentiary support in addressing key 
questions and making an argument about an issue described in the prompt. Others include a more limited text stimulus, 
yet they still require students to critically evaluate or summarize information from sources. For example, CATW asks 
examinees to respond to a reading passage of 300 - 350 words by summarizing the most important ideas of the author and 
explaining the significance of one key idea, using supporting evidence and examples from prior learning or experience 
(CUNY, 2012). Students are assessed in terms of understanding and responding to the main ideas in the passage and the 
use of supporting details and examples, including specific references to the passage. Other assessments partially deal with 
use of sources, by either asking examinees to summarize or explain the ideas in a passage (e.g., TOEFL integrated task) 
or to critique those ideas, without requiring examinees to quote or cite information from those sources as support for 
their ideas. For example, studies of expert raters indicate that although the GRE Analytical Writing tasks provide much 
information that is relevant to important writing skills at both the undergraduate and graduate levels (e.g., organizing 
ideas and information coherently, following conventions of standard written English), they do not provide information 
about students’ ability to credit sources appropriately or to integrate quoted or referenced material into their own text 
(Rosenfeld, Courtney, & Fowles, 2004). Such assessments provide better measurement of students’ critical reading and 
analytic skills, rather than their skill in writing from multiple sources, per se. 

Further, no CR assessment provided information about students’ writing process, other than drafting. The nature of 
most on-demand writing assessments precludes assessment of planning or revision, because examinees respond to a single 
prompt in a limited amount of time. While the final written product is saved and evaluated, the composition process is 
not captured. However, with technology-enhanced delivery, the writer’s process can be captured for subsequent analysis. 
Evidence from analyses of keystroke logs suggests that process-level features can predict students’ writing proficiency 
(e.g., Deane, 2014), though efforts to use keystroke-logging techniques on the fly to evaluate and score the efficiency and 
effectiveness of students’ processes or to deliver just-in-time feedback are still in early stages. Systems like ETS’s Criterion 
Online Writing evaluation service can help support students and teachers in understanding and engaging in the writing 
process by providing planning tools and a collection of prompts to which writers can compose responses and receive 
instant feedback (provided by the e-rater engine) about aspects of the text that could be improved through revision. 
When students successfully address the feedback provided by the system, their scores may improve if they resubmit the 
revised essay to Criterion (though evidence suggests that some implementations of Criterion may not take full advantage 
of the planning and revision tools; Warschauer & Grimes, 2008). However, even this system does not provide assessment 
of the writing process per se. New assessment designs that incorporate distinct planning and revision tasks, or traditional 
revision-in-context SR items, may be required beyond typical CR tasks, if assessing the writing process is considered a 
priority. Similarly, assessments intended to provide information about students’ proficiency with composing in multiple 
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modes and formats (i.e., using technology-enhanced composition tools); disciplinary conventions; or genres other than 
argument, critique, or explanation (such as a research report) will necessitate different assessment design strategies than 
those currently observed in the market. 

In sum, relative to SR items, CR items demonstrate better coverage of the written communication construct. How¬ 
ever, CR items have other notable constraints, such as the extended testing time required for essay writing tasks and 
the increased costs associated with scoring the responses, particularly if human raters are to be used (Williamson, Xi, & 
Breyer, 2012). We return to these issues when discussing assessment design challenges; but, first, we examine reliability 
and validity evidence for the assessments reviewed. 

Reliability and Validity Evidence 

Table 6 presents a summary of reliability and validity evidence available for each assessment reviewed. Reliability and 
validity evidence has been examined throughout the literature, particularly for three popular, widely used assessments: 
CAAP, CLA/CLA+, and EPP. Substantial validity evidence has also been gathered for the GRE Analytical Writing assess¬ 
ment. Importantly, for many of the assessments we reviewed, written communication represents only a part of a larger 
assessment; accordingly, for the purposes of the current review, only reliability and validity evidence pertaining to the 
writing sections will be examined here. 

Reliability Evidence 

Often, written communication assessments represent a subtest of a larger suite of assessments; therefore, it is important 
to demonstrate evidence of adequate reliability (i.e., internal consistency) for those subtest scores. The reliability of a 
particular test score is highly related to the number of items within that test, so test length is an important consideration 
with respect to reliability (Sinharay, Puhan, & Haberman, 2011). In part due to this, CR assessments often demonstrate 
low reliability compared to SR assessments, where a larger number of items can be administered within the same testing 
time. Still, sufficient numbers of items must be administered to achieve adequate reliability. For example, the EPP only 
reports individual subtest scores (i.e., a separate score for EPP Writing, Reading, Critical Thinking, and Mathematics) if 
individuals take the standard form, with 27 items per section, but not the abbreviated form, with only nine. EPP Writing 
has demonstrated alpha reliability coefficients of .81 (ETS, 2010a) and school-level reliability of .91 (Klein et al., 2009); 
estimates for the SR CAAP writing section are similarly high, with school-level reliability of .88 (Klein et al., 2009) and 
KR-20 of .92 (CAAP Program Management, 2012). CAAP also reports sufficient reliability of the Rhetorical Skills and 
Usage/Mechanics subscales, with KR-20 ranging from .84 to .86 across forms (CAAP Program Management, 2012). 

In contrast to SR tests, CR assessments are typically less reliable. For example, school-level reliabilities for CAAP Essay 
(.75), CLA Make an Argument (.84), and the CLA Performance Task (.75) are lower than estimates observed for SR-format 
assessments, but all reliability estimates exceeded .70 except for the school-level reliability of the CAAP Essay for first-year 
students, which was .68 (Klein et al., 2009). Reliability for the GRE Analytical Writing section is estimated at .82 (ETS, 
2013b), similar to the figures for the CAAP and CLA MA tasks, but slightly higher than the estimated reliability (.77) of the 
analytical writing section in the version of the GRE used prior to August 1,2011 7 (ETS, 2010b). The CL A+, the most recent 
version of the CLA, also yields relatively low individual-level reliability estimates for the Performance Task; specifically, 
CAE reports coefficient alphas for the Performance Task of .43 and .57 for test forms A and B, respectively (Zahner, 2013). 
The CLA+ Performance Task provides measures of students’ writing mechanics and writing effectiveness, in addition 
to analytic reasoning and problem solving (i.e., a critical thinking measure), by using trait scoring, rather than holistic 
scoring. The total CLA+ test achieves a higher reliability (i.e., alpha between .85 - .87) by combining the CR Performance 
Task with highly reliable SR items assessing other skills. However, the low reliability estimates observed suggest that a 
subscore for writing should not be reported. 

Interrater Reliability 

Interrater reliability measures the degree of agreement among raters for CR assessments. Many studies measure inter¬ 
rater reliability by estimating the consistency between raters using correlation methods or percent agreement; for these 
consistency estimates, values exceeding .70 are considered acceptable, yet thresholds for interrater agreement may vary, 
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had a precision-weighted average observed effect 
size of 0.37, with a standard error of 0.092. 

At the institution level, the essay section correlated 
with the MAPP at r = 0.70, the CL A MA at 
r = 0.67, and the CAAP itself at r = 0.74. 





CAAP Program Students participating in 80,010 KR-20 a = .92 for 

Management (2012) national administrations Writing Skills 
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In 2010, Criterion predicted course grades at 
statistically significant levels for all groups with 
sufficient sample size. 





Assessment Author/Year Subjects Sample Reliability Validity evidence 

ETS Proficiency Profile Banta and Pike (1989) College seniors 1,228 KR-20 a = .79 - .84 Multiple choice writing subscore correlated at 

(EPP), formerly Measure of r = 0.58 with ACT scores, total score correlated at 

Academic Proficiency and r = 0.72 with ACT scores. 


J. R. Sparks et al. 


Assessing Written Communication 


28 


2 s 

« o 

'co 

JP d 

c? B 


JP JP 


'd 

d 


03 


'd 

d 


Ph 

Ph 

3 I 


5b ^ 

2 3 

Ph < 


*d 

cT 

03 

a 

o 


o 


5 


< 

CP 


2 « 
£ ° 
^ o 

C/D f's 

d to 

T3 <D 
d —; 


<U 'Td 
50 gj 

^ a 


CO 

5-h 

£ 

50 

JJ 


d 

<U 

£ 

'd 

5-h 

’o 

U 

On 

00 

Cdv 

CO 

<U 

S-H 

Ph 

O-H 

O 

CP 

a> 

d 

d 

d 

_o 

o 

‘u 

d 

<JT) 

'd 

S 

a 

o 

o 

d 

d 

o 

03 

U 

W 

U 





'd 


<u 

a> 

-d 


-d 

50 


.o 


o3 

C-H 


-d 

CO 

d 

o 

1 

d 

n 


T3 

5 


3 

'd 


2 a 


o " j> 

-fl J; M 
p a ^ 

^ .2 p 


U 


5-H O 

Dh 0-1 
o3 O 
T-l 

Jp t* 


*ip 




50- 

<u _ 

PP 03 
O £ 
o o 
o 

.t! o 
d 'd 

H 


p 


*d 


d O ^ 


'd 

3 


'd 

a 


u u 


o3 

'd 

a 


'd 

d 


S3 I 

° ’S 

-d a> 


as 

o 

o 

(N 




d gj 

50 


p 


d 

23 


Ph 

Ph 


VC 

O 


Ph 

o $ 

2 « 


<-d 

O 


o3 

a 

_o 

Ph 

o 


d < 
II gj 
** Ph 

a 


03 


co 


„ TT »-H 

b-# 8 


___ 03 

2 a 

o £ 

<J .tn 
- £ 

.SP 3 

-d +j 

<u ^ 

■fi 3 

|s 

> o 

"p 'H-H 
gj O 

^ d 
gj O 
JP 
+-* 

S3 


On 

<N 

<N 


<U 

cd to 

a ^ 
.2 ^ 


"2 

d 

<u 

'd 

■? 


o 

Q 


o 




03 


'Td 

d 

o3 

<u 

GO 

o3 

P 

50 

d 


so 

d 

W 


d 

o3 

a 

d 

-d 

'd 




'd 

d 

o3 


o3 O 


P4 


o3 

O ^ 

v-* co 


2 d 
<u O 
yp \d 
>" J2 
d <u 

03 J-H 

o ® 
<S C) 

oi | 

O I 

gj ^ 

3 £ 

"2 

5h TO 
50 4d 


11 ^ 

1 § 

2 O 


a 


-d S 
50 (U 


o *-< 
w <u 

d "d 
2 c 
‘■2 3 
jp 


(N 

(N 


co 


o3 

2 


O 


B 


PO < 

w w 

50 

O £ 


03 

d 


2- 

3 3 1 

'a x 5 

rt W < 

S-H 

o 


o 

O 


<; -a 

Ah 

0 I 

<U < 

1 1 s 

1 O 

50^ 

Jh 

-d o 
d \d 
d ^ 

rj OJ 
T3 s-i 

'$ O 
^ O 

S| 

£ <u 
d 7d 
-d ^p 


§ 


O A2 

CO 

rS iS 

S d 

CO 

I I 

50 


Jp 

^ -I 


O 

o 

CD 

o' 


P4 50 


PP 


'd 

d 

o3 

VO 

o 

o 


o 

o 


ETS Research Report No. RR-14-37. © 2014 Educational Testing Service 





J. R. Sparks etal. 


Assessing Written Communication 


J 

p 

o 

U 


3 


3 

<u 


£ 

d 

■s 


p 

<D 





\>. 

m 

tj 

(U 


p 

_o 


a 




(N 

d 



u 

■M 

i 




m 

d 

II 

v. 

jn 

p 

</) 

O 

CL, 

a> 

(U 

vd 

•p 


'P 

P 

II 

L. 

o 

p 

CD 

<u 

<u 

fH 

bO 

_P 

-P 

O 

X 



OJ 


<u 

-p 

1—1 

-P 

-P 

X 

p 

o 

II 


X 

cS 

c§ 

o 

Lh 

P 

MD 

<yj 

P 

rt 

cu 

S-H 

bo 

_P 


L 




1 —, 

<U 

Ph 

<U 


X 

'P 

P 

P 

| 

bo 

P 

-m' 

X 

3 

P 

X 

vO 

X 

03 

Cl, 

c§ 

OJ 


■p 

X 

{« 

p 


C /5 

sO 




bo 

c3 


jy bo 
Ctf P 


X X 

a ^ 
o % 


o 

a 

o 

bO 

.p 

d 

p 


H .tn 
CQ *h 

3 s 

hH <+h 
CQ O 

o .cf 

£ § 
° Jh 

a o 
p tj 


i § 


3 d 
* •§ 
o' 

^ 60 


a> 


o 

o 

(N 


O 

U 


P P 

<U <U 

-P X 
£ £ 


bO 

p 


£ 

Jh 

£ 

p< 

d 


o 
o 

(N 

a ^ 

& S2 _ 

^ ^ oo 

.3 s e 

" Cl, O 

* ^ X 
O X 1a 


a> 

bo 

JU 

15 

o 


$ g 

S-( P 
<u p 
CQ o 

o z 


%b 

p 

w 


O 
,-H <N 

o ^ 

(N JU 

bo 


co 

H 

W 


a> 


bD 

P 


H 

W 

H-l 

PQ 

w 

O 

H 


cb 

a 

H 

P 

O 

-P 


P o'- 

CQ 

iP d 


bD 

P 


LO 
. "'t 
p <u 

« £ 
* 

CQ 


O <+H 

CQ o 


o' 


p 

o 

-p 


•P o 
*P o 

<U (N 
S-H | 

o 1 

I <N 
LT) o 
P< O 
<u <n 

* B 

O 


CTJ 


04 

oS 

a 


bo 

P 


X 

P 


<u 

bD 

P 

c< 

■ Sh 

’5b to 

o h 

<u ^ 

a 


p 

c3 

X 


t3 

p 


*P 

P 




£ 

o 


H «. 
U w> 
< =, 
■S 8 
5 •£ 


O </J 

? 13 

■s -e 

33 kJ> 
g)> 

w 


'p 

p 


X 



C/D 

PC 


"~~] 

C/5 


bD 



P 

bC 


PJ 

P 


Jh 

PJ 


,o 

u 

NO 

Mh 

& 

II 


ON 

o 

o 


o 

U 

■g 

o 

o 

-p 

£ 


< 

CQ 


P 

CQ 

<u 


PJ 


W W 
<u 
bo 


<u 


<u 


bD o 
^ ’P U 
O ^ - 

U 


2 o 


g> B 


vO 

O'- 

ON 

«+H 

O 

3 

"in 

o 


W5 

03 

xP 

m 

p 

P 

o 

Ph 

NO 

p 

d 

, c3 
nlh 

X 

d 

, o3 
4 h 

(A 

'Eb 

'Hh 


p 

\0 

-P 

(N 


o o 
<u <u 


2 2 w m 


Oh 

a 


y, w 
S -5 

<u 

a ^ 


§ ^ 
S :§ 

^ ’bb 
'p p 

^ w 

P3 . 
P 

o o 
“ 2 


™ -3 
p p 

bO 03 


<N 

u ^ 

p 2 


w 


p 


o 

CQ 


^ ^ P 

a> g; 3 

<u T3 

SI j 

.a 3 s 

'P "S -M 
b) OJ P 

y p 
o’ p 
cp ^ 3 

c/5 O" to 

■g ^ VM 
p T—1 o 
<u - u 

-p Vi sO 
H (fl d' 

H P 00 
Cl, in 


bo 

p 


c^ 


"p 

p 


X5 

P 


P & 


ON 


fi 

to 

o 


m 

u 

C4 

J2 

_u 

(N 

rt 

-p 


o 

d 


ON 

3h 

bo 

o 

p 

p 

oo 

o 

o 

o 

(N 

_P 

Lh 

2 

T3 

ON 

Lh 

Qh 

O 

<N 

O 


p 

c^3 

<U 

d 


U 


a 

6 


Ph 


P$ 

W 


'p 

p 


bo 

P 


P 

W 


rt 

E 


bo 

.a a 

•t3 03 

? ” 
3 3 

2 3 

g> § 

.p o 
JQ 

J§ C-I 


ETS Research Report No. RR-14-37. © 2014 Educational Testing Service 


29 


Haswell (1998) Juniors 2,500 KR-20 a =.83 81% of native writers, transfer and nontransfer, and 

nontransfer nonnative writers pass. 
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depending on the stakes of the assessment. For the CLA+, interrater correlations of .67 to .75 have been observed across 
several forms of the performance task, which is scored using a 3-trait rating system (Zahner, 2013). For the CAAP Essay, 
interrater reliability estimates range from .68 to .74 across seven prompts, with percentage of perfect agreement ranging 
from 70 to 78% on a 1-6 holistic rating scale (CAAP Program Management, 2012). Thus, these assessments appear to 
achieve acceptable interrater reliability as measured by consistency estimates (Stemler, 2004). 

For CR items scored by both human raters and automated scoring systems, correlations between those two rat¬ 
ings are reported as a measure of the extent to which the human and automated system agree. Correlations observed 
between human and automated scores were somewhat lower than correlations among human raters. As one example, 
the WritePlacer online assessment reports Pearson correlations of r = .63 between holistic scores assigned by humans 
and those assigned by the IntelliMetric automated essay scoring system (James, 2006); operationally, this assessment 
uses automated scoring as the sole method of scoring student essays. For COMPASS e-Write, observed correlations 
between human raters and automated scores ranged from r = .67 to .83 across prompts for holistic scores; correla¬ 
tions between human and automated scores on analytic (trait scored) subscales ranged from r = . 55 to .60 across 
prompts (ACT, 2006), suggesting that automated scoring methods may not be able to provide sufficiently reliable trait 
scores. 

Further research on the validity of ETS’s e-rater has reported correlations between human and e-rater scores that are 
comparable to those observed between two human raters. For example, Burstein, Kukich, Wolff, Lu, and Chodorow (1998) 
examined scores on a sample of 500 GMAT analytical writing essays, across a sample of eight argument and five issue 
prompts. They reported correlations of .82 to .89 between two human raters, compared to .79 to .87 between e-rater and 
each of the human raters. Reported human/e-rater correlations for the GRE are somewhat lower (.73 to .74) compared to 
human/human correlations (.83 for argument and .85 for issue prompts; Powers, Burstein, Chodorow, Fowles, & Kukich, 
2002a, 2002b). Research on the IntelliMetric system has reported average correlations between human and automated 
scores of .83 across six different GMAT analytical writing prompts (Rudner, Garcia, & Welch, 2006); this correlation is the 
same as the average observed correlation among two human raters (r= .83), indicating comparable interrater reliability 
across automated essay scoring (AES) and human scoring methods. Correlations ranged from .80 to .84 across forms of 
the argument task and from .83 to .87 for the issue task, indicating that both task types achieved good reliability. However, 
because this agreement is imperfect, Powers et al. (2002a) suggested only using automated scores to supplement human 
ratings, particularly under high-stakes testing conditions. 

Convergent Validity Evidence 

Convergent validity evidence concerns the relationship between scores across tests measuring similar constructs (AERA, 
APA, & NCME, 1999). Klein et al. (2009) reported correlations between the two SR EPP and CAAP Writing tests of 
.72 at the student level and .97 for the school level, representing a very strong relationship in the aggregate. Overall, SR 
assessments of writing skill as an SLO appear to be better correlated with one another than comparable CR format tests. The 
lowest student-level correlations among the writing measures administered by Klein et al. (2009) were observed between 
the CAAP essay and the EPP (r= .33), the CLA Performance Task (r= .32), and the CLA MA task (r= .37). Again, at 
the institution level, these correlations were somewhat higher (r EPP = .70, r CLA _ PT = .58, r CLA MA = .67), indicating that, to 
some extent, both SR and CR assessments measure a comparable construct, but that this relationship is far from perfect. 
Klein et al. (2009) attributed low correlations with open-ended measures of written communication as due, in part, to the 
low reliability of CR assessments with few items, noting that multiple essays would enhance test reliability. Others have 
suggested that good estimates of students’ writing ability can be obtained by combining SR and CR formats (e.g., Breland, 
Camp, Jones, Morris, & Rock, 1987). This logic is evident in the designs of the CLA+ and CAAP, which each combine SR 
with extended CR item formats. 

For the GRE, moderate correlations have been observed among GRE Analytical Writing tasks and the GRE Verbal 
section, with estimates ranging from r = .51 for the issue task and r = .55 for argument (Ramineni, Trapani, Williamson, 
Davey, & Bridgeman, 2012) to .66 overall (ETS, 2013b), suggesting that the CR Analytical Writing section measures skills 
that are related to, but somewhat distinct from, verbal reasoning skills. Further, scores from the Criterion Online Writing 
evaluation system showed moderate correlations with SAT writing in 2009 (r = .43) and 2010 (r = .41; Klobucar, Elliot, 
Deess, Rudniy, & Joshi, 2012). Thus, CR assessments achieve moderate evidence of convergent validity. 
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Concurrent Validity Evidence 

Concurrent validity refers to the relationship between an outcome and a criterion measured at the same time (AERA et al., 
1999). Evidence of concurrent validity has been evaluated for several assessments we reviewed, particularly by computing 
correlations among the assessment scores and other measures, such as ACT scores, SAT scores, or GPA. For example, EPP 
writing correlates with ACT scores (r = 0.58; Banta & Pike, 1989), while data sampled over a 10-year period shows that 
students with a higher college GPA consistently achieved higher EPP writing scores (Liu & Roohr, 2013). 

For CR tests, the optional EPP essay correlates with both community college placement exams (r = .51) and SAT scores 
(r = 0.27-0.37; Liu, Bridgeman, & Adler, 2012). Compared to EPP, the CLA Performance Task demonstrates higher cor¬ 
relations with SAT (.56 and .54 for first-year and senior students, respectively; Klein, Benjamin, Shavelson, & Bolus, 2007). 
Total CLA scores have a school-level correlation with SAT of .90, while at the student level, CLA total has a moderate cor¬ 
relation with college GPA (r = .50, increasing to .65 when adjusted for reliability; Benjamin & Chun, 2003). Correlations 
between GRE Analytical Writing scores and college GPA are relatively low, ranging from r = . 13 - .20 (Powers, Fowles, & 
Welsh, 2001; Ramineni et al., 2012), with the highest correlations observed with GPA in writing-intensive courses (r = .34; 
Powers et al., 2001). In terms of placement tests, the COMPASS e-Write essay, which is scored using automated scoring 
techniques, correlates r= 21 with the total COMPASS test, r= .29 with ACT English, and r= .21 with ACT composite 
(Matzen & Sorensen, 2006). 

Predictive Validity Evidence 

Predictive validity concerns the extent to which outcomes such as college GPA can be predicted from the assessment 
scores. The predictive validity of the SR CAAP writing skills test was evaluated by examining the relationship between 
sophomore CAAP writing scores and junior-level GPA. Across seven institutions (n = 1,514), junior English GPA had a 
median correlation of .25 with sophomore CAAP writing skills scores (ACT, 2010). Further, the median cross-institutional 
correlation between sophomore CAAP writing skills and cumulative English GPA was .37, with a range of .26-.57 across 
a sample of eight postsecondary institutions (ACT, 2010). Thus, sophomore CAAP scores have modest predictive ability 
for junior-level GPA. For the EPP, Lakin, Elliott, and Liu (2012) observed a significant relationship between college credit 
hours and EPP Writing score (r = .31). Marr (1995) also reported significant Spearman rank correlations between EPP 
Writing and percent of total core college courses completed (r = . 19) and core courses completed in humanities (r = .07), 
social science (r = .06), natural science (r = .12), and mathematics (r = .12). 

While predictive validity evidence for CR format tests is more limited, the English Placement Test essay has been 
demonstrated to correlate r= .35 with English grades andr = .21 with fall semester GPA (Michael & Shaffer, 1978). For the 
GRE, Klieger, Cline, Holzman, Minsky, and Lorenz (in press) reported small but significant correlations (r = . 16) between 
GRE Analytical Writing scores and graduate-school GPA for samples of more than 24,000 graduate students. The highest 
correlation of GRE-AW with GPA at the master’s level was observed for English language and literature students (r = .28), 
suggesting that this test was most predictive for fields requiring considerable reading, critical analysis, and writing of 
texts. 

For placement tests, predictive validity is evaluated in terms of placement accuracy, or the extent to which students are 
placed in a course of study in which they are likely to be successful. ACT Writing accurately placed 65% of students, with 
66% of students earning a B or better (ACT, 2009). The ASSET Writing Skills test performs similarly, with Moss and Yeaton 
(2006) reporting that in college-level English classes 68% of students correctly placed in college English earn a B or better, 
compared to 54% of students who were initially sorted into developmental English courses. Placement accuracy rates for 
the COMPASS Writing Skills test range from 60% (Davey, Godwin, & Mittelholtz, 1997) to 66% earning a B or higher 
(ACT, 2006; Belfield & Crosta, 2012; Scott-Clayton, 2012). Placement accuracy rates are lower for the ACCUPLACER 
(59% earning a B or higher; Belfield & Crosta, 2012; Mattern & Packman, 2009). 

Challenges in Designing Written Communication Assessment 

Designing educational innovations involves negotiating a series of tradeoffs, which requires considering and making 
decisions to prioritize certain design aspects over others, which may be in tension with one another (cf. Collins, 1996). 
Designing assessments of written communication presents a number of specific challenges, which we describe below. 
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Balancing Authenticity and Psychometric Quality 

Authenticity of writing assessment is considered a critical component of the validity of writing assessments, concerning 
both face and construct validity (Murphy & Yancey, 2008). Authenticity can be defined as the extent to which the features 
of an assessment task correspond to the features of the situations in which the skills being assessed will be used and 
applied in the real world (Bachman & Palmer, 1996). For higher education, the notion of authentic writing assessment 
suggests that the tasks included in the assessment design should correspond to the types of writing assignments required 
of students in their undergraduate coursework, such as writing arguments or research articles (Burstein et al., 2014). The 
notion of authentic writing assessment is consistent with the position statement on writing assessment released by the 
Conference on College Composition and Communication (CCCC, 2009), which asserts that best assessment practices 
ask students to produce writing within a meaningful context: 

The assessment of writing must strive to set up writing tasks and situations that identify purposes appropriate to and 
appealing to the particular students being tested.... What is easiest to measure—often by means of a multiple choice 
test—may correspond least to good writing; choosing a correct response from a set of possible answers is not 
composing. (Principles 2A and 2B) 

Generally, CR format assessments are considered more authentic relative to tests consisting solely of discrete SR items, 
because they require students to compose extended text. However, some scholars argue that CR tasks are still not par¬ 
ticularly authentic because they ask students to write about unfamiliar topics under highly constrained conditions. For 
example, Weigle (2002) argued that on-demand CR assessments “[do] not accurately reflect the conditions under which 
most writing is done in nontesting situations or writing as it is taught and practiced in the classroom” (p. 197). 

A balance of authenticity and psychometric quality could be achieved through a combination of direct and indirect 
writing assessment (i.e., use of both SR and CR item formats; Breland et al., 1987). Providing students with a meaningful 
and realistic task context for writing an essay (e.g., to persuade the Board of Trustees to adopt a particular policy; to 
identify and explain to key stakeholders the critical flaws in a business proposal) offers a more authentic assessment 
task with a specific purpose, audience, and context for the writing task, consistent with the notion that all writing is 
fundamentally social (CCCC, 2009). The authenticity of SR items assessing skill in identifying and revising errors could 
be enhanced by presenting items in the context of an extended passage (versus discrete sentences) and a realistic task 
(e.g., attending to a peer’s feedback on a passage; Haswell, 2008). Research on scenario-based assessments (Sabatini et al., 
2013; Sheehan & O’Reilly, 2012) can inform the design of literacy assessments that have a balance of authentic purposes 
and desirable measurement properties. 

Assessment Purposes: Supporting Institutional or Individual Goals 

Members of the higher education writing community have suggested that assessment should primarily function to support 
evidence-based decision making intended to improve the teaching and learning of writing (CCCC, 2009; NCTE-WPA, 
2010). Further, the intended purpose of a writing assessment should influence its design (CCCC, 2009). The desire for 
assessment results to provide actionable information to the institution in the service of improving teaching and learning 
suggests a need for alignment between the constructs measured in the assessment and the competencies that are relevant 
to the local curricular and instructional context. Alignment between instruction and assessment is also important for 
the measurement of student growth attributable to a curriculum or course of study; as Haswell (2008) noted, “The gain 
[from an intervention strategy] most often occurs when the classroom intervention is clear and concrete and when the 
measurement of writing accomplishment focuses analytically on traits associated with the teaching method” (p. 410). 
Thus, to have instructional value, the assessment results should inform institutions about the aspects of writing that pose 
challenges for their students, which could be addressed through instruction. 

Some assessment formats may be more appropriate for supporting some institutional goals. For example, portfolio 
assessment represents an approach to evaluating student writing that is highly tailored to the local context (Behizadeh, 
2014; Yancey, 1999), which may be quite useful for informing local curricular and instructional improvements. Institu¬ 
tions may also wish to make comparative evaluations of writing proficiency for groups of students across institutions, for 
purposes of benchmarking or accountability; this requires assessments that are not so locally defined that the test will 
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fail to yield meaningful comparisons when administered to a different population of students, at schools using different 
curricula or instructional approaches. This goal suggests a relatively domain- and discipline-general approach to design¬ 
ing writing assessment such that assessment tasks should measure aspects of the construct that are practiced across a 
range of student majors and fields of study, so as not to advantage students from a specific curriculum or course of study. 
This logic is evident in SR assessments that measure the surface-level, linguistic aspects of writing skill, which may yield 
reliable comparisons yet have more limited instructional relevance, particularly for low-level editing skills that students 
are presumed to have mastered prior to enrolling in college. 8 Despite this presumption, usage and mechanics may still be 
considered instructionally relevant, given that college student populations are increasingly diverse, with many students 
enrolling in college courses, unprepared for the writing assignments that are required of them. Because institutions need 
assessments that provide actionable information about the strengths and weaknesses of highly diverse student groups, 
evaluating the linguistic aspects of students’ writing remains an important goal. Therefore, to support the goals of compa¬ 
rability and instructional improvement across populations and institutions, writing assessments should provide evidence 
of students’ proficiency with linguistic, as well as conceptual and rhetorical, aspects of the writing construct. 

Beyond institutional goals, the extent to which a writing assessment is intended to support individual-level goals 
dictates the extent to which scores that are reliable at the student level are required. If the assessment is designed for 
institutional use only, the scores provided by the assessment need only be reliable at the group level (i.e., at the level of the 
institution), rather than reliable at the level of the individual student. For example, some SLO assessments are primarily 
designed to provide information at the aggregate group level and may not require highly reliable individual scores. Alter¬ 
natively, placement tests, which have stakes in terms of the course of study an individual may pursue, must be reliable at 
the individual level, due to the potential consequences for the student’s educational trajectory. Similarly, if the results will 
be used for credentialing, such as a certificate or badge, it is important that such certifications be reliable at the individual 
level—particularly if those credentials have consequences for educational attainment or employment. 

Reporting Overall Scores Versus Subscale Scores 

A related issue is the extent to which a single score can be used to represent students’ writing proficiency and whether 
meaningful subscale scores can be reported to institutions or to examinees. From an institutional perspective, subscales 
can yield valuable information about the relative strengths and weaknesses among students’ proficiency with particular 
aspects of written communication and whether proficiency varies as a function of students’ major, years of college experi¬ 
ence, and so on. Such information can be used to make improvements to curriculum and instruction. For the individual, 
subscale scores can provide useful feedback about the aspects of writing in which additional practice is needed. Thus, to 
support learning and instruction, provision of subscale scores may provide greater diagnostic information beyond overall 
scores. However, from a measurement perspective, it is only defensible to offer examinees subscale scores if these scores 
are reliable and valid. Haberman (2008) described methods for determining the added value of subscores relative to total 
test scores; these methods should be applied when determining whether or not subscores should be reported to examinees. 
In some cases, subscores do not add useful information to examinees; therefore, these subscores should not be reported 
(Sinharay & Haberman, 2008). 

Subscale scores can be computed from SR assessments by having sufficient numbers of items assessing each skill of 
interest, such as rhetorical skills or mechanics and usage, such that reliable subscores can be reported for each skill 
(e.g., CAAP provides scores for these two dimensions). For CR assessments, subscale scores can be obtained by applying 
analytic or trait scoring. In contrast to holistic scoring, in which raters assign a single numerical score to the examinee 
based on an overall evaluation of the work, trait scoring requires raters to assign a numerical score for each of the qual¬ 
ities (or traits) that are important in the assessment, considered separately. For example, the CLA+ performance task 
scoring rubric asks raters to evaluate students’ responses for three traits; analytic reasoning and problem solving, writ¬ 
ing effectiveness, and writing conventions. For a given essay, a rater must provide three separate scores. Accordingly, 
trait scoring can provide more detailed, diagnostic information to examinees about their writing compared to a single, 
holistic score, which may not provide detailed information with respect to the writer’s specific weaknesses but rather 
descriptions of the types of weaknesses commonly exhibited by responses receiving the same score (as in GRE Analytical 
Writing). 

In a study comparing the use of trait and holistic scoring in the CUNY CATW, Faggen (2001) found that holistic 
scoring was somewhat more efficient than trait scoring, with raters divided as to which method they preferred. While 
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some believed that trait scoring would provide more diagnostic information, evidence of high correlations among the traits 
suggested that trait scoring might not provide more detailed information than a holistic score, provided that the scoring 
criteria were comparable. Thus, holistic scoring is often preferred. In assessments with both SR and CR components, these 
test sections may either be reported separately, combined into a single, weighted proficiency score, or both. 

Human Scoring Versus Automated Scoring 

Beyond the issue of whether to report an overall proficiency score or multiple subscores is the issue of whether to employ 
automated scoring engines to support — or supplant — the use of human raters to score CRs. Recruiting and providing 
training and calibration to human raters is a time-consuming and often costly process, particularly in the case of large- 
scale assessments administered to thousands of students. Automated scoring engines offer two distinct advantages relative 
to human raters, in terms of reliability and cost. The scores provided by automated essay scoring systems are highly reliable 
(i.e., internally consistent), in that they apply an identical scoring algorithm each time an essay is scored; further, they 
demonstrate high correlations with human ratings (i.e., interrater reliability; Burstein & Chodorow, 2003; Chodorow & 
Burstein, 2004), often comparable to the agreement among two human raters. With respect to cost, automated essay 
scoring systems require little time per essay to apply the scoring model after model development has occurred, making 
the average cost to score one essay minimal compared to that of a human rater. These advantages have led automated 
scoring systems such as the ETS e-rater engine (Burstein & Marcu, 2003) to be used as a check score or second score in 
operational scoring of CR assessments (cf. Deane, 2013). For example, each GRE essay is scored by e-rater and at least 
one human rater, using a holistic scoring rubric with a 1-6 scale. If e-rater and the human rater agree within a certain 
threshold, the human rater’s score is accepted as the final score; however, if the discrepancy between human and e-rater 
scores exceeds that threshold, a second human rater will score the essay, with the final score being the mean of the two 
human scores, rounded to the nearest half-point (ETS, 2013b). In contrast, systems like Criterion (Burstein, Chodorow, 
& Leacock, 2004), as well as WritePlacer and the optional EPP essay, use automated scores as the primary method of 
evaluating students’ writing. 

Although automated scoring has clear advantages, the decision to use such methods should take into consideration 
the validity of the test scores for a particular intended use. Critiques of the use of automated scoring methods alone 
(or in general) hinge on the notion that the features of text that can be feasibly evaluated using automated methods 
are not necessarily coincident with the features that correspond to good writing, including logical and accurate con¬ 
tent (Condon, 2013; CCCC, 2009; Perelman, 2012). As noted by Deane (2013), writing is a complex skill, some aspects 
of which can be better captured by automated writing evaluation methods than others. Automated scoring methods rely 
on natural language processing techniques to detect and compute features of the text that are associated with higher 
quality writing. Many of these features are low level, such as nonstandard grammar, spelling, and punctuation, which 
are relatively easy for automated methods to detect, but automated scoring models also attempt to evaluate higher level 
features of writing quality. For example, e-rater is designed to measure both lower and higher level features, categorized 
under dimensions such as organization and development, vocabulary (i.e., word choice), grammar, usage, mechanics, 
and style (i.e., sentence variety). However, as Deane (2013) emphasized, concepts like organization and development as 
they are instantiated in an e-rater model are not interpreted in ways that humans would understand and apply these 
terms; development, for example, is largely a measure of length, rather than the quality of supporting ideas or examples. 
Research by Attali and Powers (2008) suggested that the text features measured by e-rater can be collapsed into three 
factors: fluency (including organization and development), accuracy of text production (i.e., skill in producing error-free 
text), and vocabulary sophistication (i.e., use of low-frequency vocabulary words). These factors do not correspond to 
the social and rhetorical elements of writing that are emphasized in the frameworks reviewed in the first section of this 
article. 

Overall, automated writing evaluation systems seem to measure a restricted version of the construct, which excludes 
some critical communicative elements. Deane (2013) concluded that e-rater and other state-of-the-art scoring engines 
provide a measure of text quality based on surface linguistic features, rather than a measure of writing skill per se. With 
respect to using information from sources, previous research projects have developed effective methods for detecting 
use of explicit citations, plagiarism from sources, and other sourcing related issues (e.g., Britt et al., 2004; Deane, 2014; 
Hastings, Hughes, Magliano, Goldman, & Lawless, 2012), but this development often requires hand-coding of sources 
and training of prompt- and/or task-specific models in order to detect certain anticipated strings in student responses 
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(e.g., according to Blake, as Carnegie writes; Britt et al., 2004). These model building efforts would be required for each 
prompt or test form, making them costly to develop. Further, these efforts have not yet effectively developed automated 
methods for detecting critical analysis and synthesis of sources, and other higher-order skills, such as argumentation and 
the accuracy of content, pose significant challenges as well (e.g., Powers et al., 2002a). Research in these areas is ongoing, 
but existing automated scoring models do not yet provide reliable, valid assessment of these aspects of writing sufficient 
for operational use. 

To assess the features of students’ writing that are important at the higher education level, it is likely that humans will be 
required to read, evaluate, and provide ratings of students’ work with respect to a holistic or analytic rubric that takes into 
account these social and conceptual aspects, until automated scoring methods advance significantly. The CCCC (2009) 
asserted that best practice of writing assessment is to use direct assessment with human raters, particularly in the case of 
high-stakes assessment. While use of automated methods alone maybe sufficient for a low-stakes assessment, we concur 
with the CCCC that the greater the assessment stakes, the more important it is to use human scoring. Because writing is 
fundamentally a social act, done to communicate meaning to an audience, it is important that a human reader evaluate 
the extent to which that communication successfully achieved the task goals. 

An Operational Framework for Next-Generation Written Communication Assessment 

Below, we outline a proposed operational framework to support the design of next-generation written communication 
assessments. We present our framework and construct definition, followed by a description of the structural features and 
task types that such an assessment might include. We then describe how the current framework compares with existing 
frameworks and assessments. 

Proposed Framework and Definition 

Informed by the preceding review and synthesis presented in the first and second parts of this article, the proposed opera¬ 
tional framework and construct definition for written communication appears in Table 7. We have organized the construct 
definition for written communication into four major dimensions: 

• Knowledge of social and rhetorical situations, which concerns the purpose-driven, social nature of all written commu¬ 
nication, includes the ability to adapt one’s writing to the demands of the specific context, audience, and purpose for 
writing; adherence to genre conventions, such as those for writing arguments or explanations; and skill in creating 
multimodal or multimedia products, using traditional and digital methods of production. 

• Domain knowledge and conceptual strategies, which concerns the use of relevant content knowledge and devel¬ 
opment strategies, includes the ability to develop one’s ideas using sufficient and effective reasons, evidence, and 
examples; presenting those ideas in an organized, logical, and coherent sequence; use of information drawn from 
sources to support one’s ideas without distorting the author’s original meaning; and adherence to disciplinary con¬ 
ventions, such as evidentiary or organizational standards. 

• Knowledge of language use and conventions, which concerns the linguistic elements of writing, includes the ability 
to convey meaning clearly by using appropriate word choice, tone, and style, given the purpose of the writing, as 
well as the ability to produce relatively error-free text without substantial flaws in usage, syntax, and mechanics. 

• Knowledge of the writing process, which cuts across the preceding social, conceptual, and linguistic dimensions, 
concerns the various strategies used to support prewriting or planning, drafting, and revision of text, as well as 
reading and appropriately responding to others’ feedback. 

Taken together, these dimensions represent a rather comprehensive view of written communication, spanning social 
and rhetorical, conceptual, and linguistic aspects of producing quality writing, including knowledge of the writing process 
(planning, drafting, and revision) as a major aspect of the framework. Importantly, the purpose-driven social and concep¬ 
tual aspects of writing should be the primary focus of the assessment, in contrast to lower level language elements; further, 
information about students’ proficiency with the writing process could provide useful feedback to both instructors and 
students in the service of supporting teaching and learning. However, these framework dimensions and corresponding 
definitions alone reveal little about how these aspects of writing will be assessed. Below, we propose a set of structural 
features and task types that may be used to evaluate these various aspects of written communication. 
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environments) to create written products, 
which may include multimedia elements, 
particularly when communicating complex 

information and ideas. Portfolio assessmentsjinnovative item types (e.g., 

select an image that best illustrates your point). 
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task-appropriate use of sources 
(integrated). 





Knowledge of language use and conventions 

Language use: word The ability to compose text that conveys Any direct writing assessment; e-rater scores this ACCUPLACER: effectiveness of sentence 

choice, tone, voice, and meaning clearly by using appropriate word construct in terms of word choice (sophistication constructions; AWPE: word choice, 
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Item and Task Types 

As described in the second section of this article, existing assessments of written communication typically use CR items, 
though several couple these with SR items to compensate for the relatively low reliability of CR items. While many of the 
SR items included in these assessments are traditional discrete items (i.e., single-selection multiple choice [MC]; EPP), 
other assessments use groups of SR items associated with a particular passage (i.e., a set leader and set members; CAAP). 
Single-selection SR items presented in a passage context could also be administered as drop-down menus, drag-and- 
drop, or other more innovative technology-enhanced item types. While more complex technology-enhanced item types 
can sometimes require more effortful and time-consuming processing to read and make a response compared to basic MC 
response types (see Graf, 2009), they may provide evidence about an examinee’s higher order reasoning skills as opposed 
to the passive recognition skills often elicited by traditional MC. Because some item types and tasks are more appropriate 
for measuring some aspects of the writing construct than others, we recommend using various item types to provide a 
more complete view of students’ proficiency with writing, spanning the social and rhetorical, conceptual, and linguistic 
dimensions of this skill. For example, single-selection SR items maybe better suited to measure the lower level linguistic 
aspects of writing, while drag-and-drop formats may be useful for assessing students’ use of sources (e.g., by dragging an 
in-text citation to an appropriate location in a passage). 

Structural Features of Items 

Table 8 presents a proposed taxonomy of structural features for items assessing written communication, based on the 
framework and operational definition described above. Consistent with other widely used writing assessments, we propose 
the use of both CR items (i.e., direct writing assessment) and SR items (i.e., indirect writing assessment) to achieve a 
balance of authenticity and psychometric quality (i.e., reliability and validity). Beyond the typical single-selection SR items, 
we propose the use of more interactive structural features (e.g., drop-down menu, select in passage, drag-and-drop) where 
appropriate for measuring the intended construct (e.g., using drag-and-drop to add appropriate supporting evidence 
or citations to a stimulus passage). The use of technology-enhanced item types affords different kinds of measurement 
opportunities compared to traditional MC assessment. For example, such item types could be used to assess multimodal 
composition skills that cannot easily be assessed with typical CR item types, such as selecting an image or graph that best 
supports one’s arguments and inserting it into a particular location in the text. Further, composing text on a computer 
can provide information about the writing process that cannot be captured with traditional CR items (i.e., pencil and 
paper). The use of technology-enhanced items also makes the assessment experience more dynamic and potentially more 
engaging to students, which can provide more robust, valid information about their abilities. 

Task Types 

The specific nature of the assessment task(s) is also an important consideration for assessment design. The structural fea¬ 
tures described above could be used to support several task types that we consider promising for measuring the aspects of 
written communication defined in the current framework. Table 9 presents descriptions for several CR and SR assessment 
task types, with their correspondence to the operational framework, and examples of similar assessments. 

As consistent with best practices in writing assessment (CCCC, 2009), CR should be preferred when possible, because 
these item types permit direct assessment of multiple aspects of students’ writing simultaneously and directly, while SR 
items tend to target a specific aspect of the construct, such as organization or syntactical errors, and then only indi¬ 
rectly. Importantly, with respect to use of SR items, we do not advocate the use of discrete, sentence-level traditional 
MC items for an assessment of written communication at the higher education level. Fow-level items such as these do 
not represent the skills and competencies that are required of real-world writers, who work with ideas in the context of 
extended discourse rather than discrete and isolated sentences. Therefore, if these linguistic-level skills are to be assessed, 
they should be done so in the context of extended written discourse, which examinees must either read, respond to, 
and make revisions to — or which they produce themselves in a CR task. In addition, to the extent possible, the tasks 
should be introduced in such a way that they represent an authentic context and purpose for writing, with a specified 
audience to be addressed. For example, a set of revision-in-passage-context items could be framed as a peer-editing task 
or as responding to feedback from an instructor rather than an abstract task done solely for the purpose of taking the 
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Table 8 Descriptions of Structural Features of Written Communication Items 


Item type 


Description 


CR prompt 
Set leader 

Set member, single selection MC 

Single selection MC 
Multiple selection MC 
Drop-down menu 

Select in passage (single 

selection/multiple selection) 


Drag-and-drop 


Writing-based CR item, in which examinees compose an open-ended response to a prompt, 
which may or may not include source texts. 

Stimulus (i.e., passage) for which there are one or more items (set members) that are based on 
the stimulus content. 

Stem with multiple answer choices, of which one is the correct response; displayed along with 
set leader for reference. Revision-in-passage context items follow this format (e.g., CAAP). 

A stem with multiple answer choices, of which one could be a correct response. 

A stem with multiple answer choices, of which two or more could be a correct response. 

A variation of a traditional MC item, where one answer choice is selected via a drop-down 
menu. 

Item where the answer choices are predefined set of words, phrases, sentences, or paragraphs 
within a set leader. When test taker clicks on selection, the word/sentence is highlighted in 
the passage. If only one answer, use Select in Passage SS. If two or more answers, use Select 
in Passage MS. 

An examinee selects objects (i.e., text segments, citations, or images) and places them in a 
specific location or order within a text. 


Note. CR = constructed response: MC = multiple choice (MC); CAAP = Collegiate Assessment of Academic Proficiency. 


assessment. Such tasks may be more reflective of the real-world settings in which college-level writers engage in the 
practice of writing. 


Unique Features of this Framework 

The framework and construct definition presented in this article are informed by current research on writing and writ¬ 
ing instruction, which views learning to write as a process of socialization into a particular set of practices for achieving 
particular social and rhetorical goals (e.g., presenting a scientific argument or advancing a particular historical or lit¬ 
erary interpretation), and by current higher education frameworks, which recognize that the construct of writing must 
be updated to reflect the place of written communication in a contemporary social and technological context. The abil¬ 
ities to produce multimedia compositions, to synthesize information from a wide variety of information sources, and 
to convey complex information effectively and succinctly are increasingly important for success in both academic and 
workforce domains in the 21st century. Consistent with the developmental competency model of literacy that underlies 
the design of CBAL assessments in K-12 (Deane, 2011; Deane et al., 2011; Sabatini et al., 2013), we conceptualize writ¬ 
ten communication as involving the coordinated recruitment of social, conceptual, and linguistic (i.e., discourse, verbal, 
and print) representations, on which the writer’s cognitive processes operate. Fluency with lower level linguistic pro¬ 
cesses frees up cognitive resources for engaging in the conceptual and social aspects of the writing. We include rhetorical 
aspects of writing in the social dimension, as rhetorical considerations are a part of the social and communicative goals 
of writing. By addressing social, conceptual, linguistic, and process-level dimensions of writing, we present a compre¬ 
hensive operational framework that can be used to evaluate existing assessments and to support the development of new 
assessments. 

We have included knowledge of composing in multiple modes and forms (including use of technological tools to com¬ 
pose text) under the social and rhetorical dimension of the writing construct, and this represents a unique feature of the 
current assessment framework. It is important to note that while the vast majority of frameworks reviewed mentioned 
this skill as important for higher education, particularly in the 21st century, none of the assessments of written communi¬ 
cation we reviewed made any attempt to provide evidence of students’ proficiency with creating multimedia compositions 
or using technology-enhanced composition methods. Skill in composing multiple different types or forms of text (includ¬ 
ing multimedia, PowerPoint presentations, etc.) as a writing outcome is typically assessed through portfolio assessment 
methods, if at all. Most assessments of writing skill do not evaluate this dimension of student writing explicitly. Specific 
information about student proficiency with multimedia composition skills might also be provided by an assessment of 
another, related construct, such as digital or information and communication technology (ICT) literacy, which concerns 
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Note. CR = constructed response; SR = selected response; GRE = Graduate Record Examinations; CAAP = Collegiate Assessment of Academic Proficiency; EPP = ETS Proficiency 
Profile; CLA = Collegiate Learning Assessment; CLEP = College Level Examination Program; CUNY CATW = City University of New York/CUNY Assessment Test in Writing. 
a The current GRE Analytical Writing measure does not require examinees to address their response to a specific audience, so the audience awareness dimension is not evaluated by this 
assessment. b Argument critique tasks do not typically require examinees to summarize, paraphrase, or quote from the stimulus prompt, nor do they require examinees to cite sources. 
Therefore, this task does not fully correspond to the definition of use of sources and textual evidence as defined in the current framework. 
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the extent to which students can use technological tools to compose multimodal communication, such as writing an e-mail 
to a colleague explaining data displayed in a graph (e.g., Katz & Macklin, 2007). 

We also include use of sources and adherence to the conventions of argument and expository genres, which are par¬ 
ticularly critical skills at the higher education level, yet are not explicit components of many assessment frameworks. 
Similar to use of sources, disciplinary considerations are considered a part of the conceptual aspects of writing, because 
they directly affect the writer’s interaction with the content; however, our goal is to design a writing assessment that can 
inform the evaluation of general student learning outcomes across curricular or disciplinary boundaries, and, thus, our 
proposed operational definition and task types do not directly address this aspect of the framework. Rather, adherence 
to disciplinary conventions could be assessed locally, within a particular school or department, through some form of 
portfolio assessment, if disciplinary writing assessment is sought beyond typical classroom assessment practices. 

In sum, the proposed framework offers several advantages, which support its use for developing written communication 
assessments at the higher education level. The framework captures multiple dimensions of writing, informed by a review 
of extant frameworks and literature from the learning sciences and the higher education writing community. It affords 
the use of multiple assessment formats, including extended CRs, traditional SR items, and more innovative item types. 
The use of technology-enhanced item types as proposed here has the potential to provide more robust measurement of 
student proficiency by obtaining evidence of skills that are difficult to measure with traditional methods and by potentially 
enhancing student engagement in the assessment experience. Such item types have been developed and administered 
in the context of assessing the language skills of English learners; these designs could be adapted for use in measuring 
undergraduates’ proficiency with college-level writing tasks. Further, combining a direct writing assessment with multiple 
indirect items designed to assess aspects of the construct that are not covered by the specific CR prompt can provide a 
balance of authenticity and technical quality. This framework can also support the design of assessments that are reliable 
at the group or at the student level, depending on the intended purpose of the assessment, though the specific degree of 
reliability obtained is an empirical question, to be revealed through pilot testing. 

Conclusions 

Written communication has been identified as one of the most important learning outcomes among higher education 
institutions, as well as employers. Frameworks from higher education, educational institutions, national associations, the 
workforce, K-12 standards, and the research literature have each offered definitions of proficiency with written commu¬ 
nication. At the higher education level, in particular, writing should involve critical and reflective engagement with others’ 
ideas, development and support of one’s own ideas, skill in producing compelling arguments directed to an audience, and 
fluency with producing coherent and logical written text that is free of errors. The operational definition proposed in the 
current article emphasizes the intersection of social, conceptual, and linguistic processes in the writing process, provid¬ 
ing a comprehensive view of what skilled written communication involves, which can be used to obtain more complete 
evidence of students’ proficiency with various aspects of writing. This framework aligns with current writing assessments 
but extends beyond current offerings by emphasizing the authentic social contexts and tasks in which real-world written 
communication skills will be deployed. 
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Notes 

1 Note that not all of the frameworks provided an explicit definition of writing or written communication; therefore, in some cases, 
a definition of the targeted construct was inferred from the statements describing the desired student learning outcomes (e.g., 
rubric statements) relevant for a particular aspect of writing skill. 
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2 Related competencies appear in other frameworks under the heading of information literacy or critical thinking, which deal with 
evaluating the relevance, reliability, and credibility of various information sources and using those sources to make and defend 
arguments, develop solutions to problems, and so forth. 

3 Second-language communication was mentioned across several frameworks (ATC21S, LEAP, BOLOGNA, DQP), but we do not 
deal with this issue in detail as second-language learning is outside the scope of the written communication per se. 

4 The linguistic aspects of literacy can be further decomposed into discourse, verbal, and print levels of representation. The 
discourse representation includes information about text structure, organization, and the situation being described in the text 
(i.e., a situation model of the text). The verbal level of representation includes information about the meaning and usage of words 
(i.e., vocabulary knowledge). The print level includes representations of print conventions (i.e., knowledge of spelling, 
morphology, and phonology). Facility with print, verbal, and discourse-level representations is required for skillful command of 
the linguistic aspects of writing. 

5 The CLA+ now includes an SR section, but these items assess students’ skill in scientific literacy, critical analysis and evaluation 
of sources, and critiquing arguments, rather than writing skill. 

6 Of course, compared to SR tests, CR format assessments often have lower reliability due to other reasons, such as failure to 
achieve high reliability in CR scoring, which are separate from concerns about test length per se. 

7 The GRE revised General Test was implemented after August 1, 2011. 

8 The presence of such items on college placement tests such as the ACCUPLACER and COMPASS which are used to determine 
whether students demonstrate readiness for college-level writing instruction or require remediation through developmental 
coursework, is consistent with this notion. 
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